my brain is starting to hurt after spending 2 days figuring out following:
(i) let's say you have predictions that follow certain distribution, eg. e-commerce that follows some type of Tweedie distribution; (ii) let's say you have definition of this distribution, e.g. Tweedie; how the hell do you transform distribution into loss function?
I noticed that NegativeLogLikelyhood can be used in cases like this, but it is defined only on inerval (0,1> which is not the case of distribution deviance
I noticed some posts about quasilikelikehood, eg. this post also probably same idea implemented in the discussion above with abbreviation QLL
how do you do it? do you somehow scale the distribution deviance to (0,1>?
after playing around with it I am not sure if this is really helpful, it seems to me that the idea is plausible but it is almost impossible for me to understand the proof in the original paper as:
the authors are using some 0.68 number in KDD dataset that I could not figure out
the metrics are hard to understand - they are separated into deciles etc.
nevertheless, I implemented the authors Keras code in pytorch, including their test and compared the original and pywtorch_widedeep implementation results in the [notebook]()
QuantileLoss
I decided to add new method - "multiregression", as the quantileloss function is regressing multiple values at the same time
@jrzaurin could you please recheck it? it is passing the unittests but it seems to have "strange" values, that are not adhering to idea that the quantile values should be increasing in each sample - maybe just a lack of training etc.
FINALL UPDATE:
branch is ready to be merged, and I can figure out later how to add whole family of tweedie distributions, ie. their losses, to the library
I also added enforce_positive parameter to wide_deep to fight possible issue with negative input in the initial training with RMSLE or Tweedie losses which require either positive or non-negative input
this post only briefly summarizes this paper which calculates Probability Density Function and then expresses Negative Log Likelihood, which is a common function used for Losses in DeepNets thanks to it's nice characteristics in interval (0,1)
this post does, in my opinion, slightly better job in summarizing the information
use of NLL is usually preferred in DeepNets, but other possibility is to use distribution deviance which is not defined as nicely as NLL on interval (0,1), ie. sklearn approach
this post gives additional summary on some(not all) other tweedie family Probability Density Functions, next steps is to do NLL and we have other Loss functions
Add losses that show promising results for regression prediction in scenarios with highly imbalanced datasets used e.g. in Life Time Value prediction