mlandry22 / rain-part2

How Much Did it Rain Pt 2 - Kaggle Competition
0 stars 0 forks source link

Probability distribution matching #6

Open JohnM-TX opened 8 years ago

JohnM-TX commented 8 years ago

Could be a wild goose chase here, but maybe you guys have experience that could take this somewhere...

After creating predictions, I did the following things:

Each time the MAE for the fit distribution is close and just a little higher. Contrary to what I thought, the values are such that blending does not seem to yield any improvement. image

mlandry22 commented 8 years ago

Nice. I had a similar interest, but for a different reason--I wanted to use an MSE solver for our MAE problem. I hadn't made near as much progress as you.

Related, a coworker was suggesting a way of doing something similar but rank ordering the targets, putting them into a probability/0-1 scale, then getting that into a normal distribution via qnorm. I haven't tried it yet either but might do so for that problem and apply it here.

Glad to see the issues being used :-) On Oct 29, 2015 8:30 PM, "JohnM914" notifications@github.com wrote:

Could be a wild goose chase here, but maybe you guys have experience that could take this somewhere...

After creating predictions, I did the following things:

  • used R fitdistrplus package to best fit a gamma distribution to the data
  • rank ordered the predicted values from xgb, keeping integrity with ground truth
  • rank ordered the best fit gamma distr values and merged into the ranked predictions
  • computed MAEs and tested blends

Each time the MAE for the fit distribution is close and just a little higher. Contrary to what I thought, the values are such that blending does not seem to yield any improvement. [image: image] https://cloud.githubusercontent.com/assets/15348323/10837861/86bee74a-7e8c-11e5-9430-ead0c7ce2fe5.png

— Reply to this email directly or view it on GitHub https://github.com/mlandry22/rain-part2/issues/6.

mlandry22 commented 8 years ago

I tried ensembling our best model and it didn't go anywhere. The first was a broad blend. The second was just 90-5-5. It could be the XGBoost is that much better. It also could be the method of using the common known output values that does help in an MAE output. Probably a little of both, but something worth paying attention to, perhaps by working through it in validation sets.