Submission Management - Githubissues

mlandry22 commented 9 years ago

Thread to discuss submission strategy. At 3 members and 2 submissions per day, it won't be too obvious how to go about this.

mlandry22 commented 9 years ago

I think for default strategy, we can try this: John and Thakur get 1/day and if I am paying atttention as the deadline rolls around (will soon be 4pm pacific, a good time on weekdays) I will claim one.

I think that default strategy will work pretty well because my plan for this competition is to have a large variety of attempts available, so I'll probably try and post what the holdout scores are as I create them, and then we can decide if we ever want to use one. I won't be able to submit everything I generate anyway, so I will allow you guys to submit what you want.

So we can discuss/post here to try and plan better, but unless it's clear that we have something interesting going on, you each can have one a day, I think.

ThakurRajAnand commented 9 years ago

Sounds good to me.

JohnM-TX commented 9 years ago

It works for me. I'll try to give a heads up if not using one.

JohnM-TX commented 9 years ago

Lately I've been sending a submission a few hours after the previous day's deadline. So, I have already submitted for "today" and we have one remaining.

mlandry22 commented 9 years ago

I have had one ready for whenever there was a free slot. It's a semi-useless blend, but something I wanted to try. But I see you have since bumped up performance of the XGB contribution, so I'll probably take the (deadline - 10 minutes) slot for some kind of blend there, unless either of you have something specific you want to test that will gain is more insight. If so, by all means, go ahead.

mlandry22 commented 9 years ago

Nice going, John. I put your best and my single best together at 70/30 and we bumped up 2 spots.

mlandry22 commented 9 years ago

We are "leading the league" in at least one category: most submissions! In fact, we are probably just under the max allowable for teaming up. The competition has been running since 9/17, so the max submissions is around 82 or 84, and we are at 79.

JohnM-TX commented 9 years ago

Glad we made it under the limit and popped up a couple spots. I have some new developments that should be good. Will tie them into the main thread and post code in the next 24 hours.

JohnM-TX commented 9 years ago

Didn't get us better position tonight, but improved the XGB score to 23.7687, just .0015 below our current best.

mlandry22 commented 9 years ago

Nice. That makes it our leading single model, so surely we'll improve on any combination. But I will resist the urge to do such a thing until we're next out of ideas. I'll try and have something ready from R, in case Thakur is still working on his by the deadline tomorrow. Thakur, if you have something, that 2nd one is all yours.

John, are you using some sort of validation set to gauge local improvement as well? No problem if you aren't. We'll want to eventually, but it's still plenty early.

ThakurRajAnand commented 9 years ago

I will set up sklearn GBM + spearmint to run and whatever best parameter spearmint is able to find before deadline will submit that. Since I will do 5 fold so might be able to report CV scores also.

\ Any suggestions for doing proper validation or anything you seems to be matching closely with LB

JohnM-TX commented 9 years ago

Yes I have a basic validation set in use. I carved out 1/5 for validation and use the other 4/5 for training (keeping ID integrity). It seems to be directionally correct and is in the neighborhood of agreement with the public LB (low 20's).

I find that if I run CV on xgboost, those scores are much lower (between 2-3). I think it's because outliers are removed before running the model, and a big part of the MAE comes from those mysterious outliers.

ThakurRajAnand commented 9 years ago

@JohnM914 I am still in office and would reach home after 1 hour only.Not sure If I would be able to generate submission on time. Feel free to use the submission if you have something ready.

ThakurRajAnand commented 9 years ago

Anyone have anything good to submit. I just started playing around and can try submitting a H2O NN if you guys don't have something good to test.

mlandry22 commented 9 years ago

I think it's all yours Thakur. John got his in last night, so take your shot. I will let you know when I have something very useful. My scripting is working, but I need to deploy it to a system where I'm comfortable letting it go for hours on end. So far it's on my everyday laptop. Step by step. Will have something I like when it's all done.

ThakurRajAnand commented 9 years ago

LB Score : 23.82631

I would say not bad at all, since NN was very simple. Below are the parameters I have used.

model <- h2o.deeplearning( x=2:22, y=1, training_frame = x1, activation = "RectifierWithDropout", hidden = c(10,10), input_dropout_ratio = 0.1, hidden_dropout_ratio = c(0.1,0.1), loss = "Absolute", epochs = 10 )

I would be saving code for each of my submission in new file and would keep naming convention of my submission and code file same. e.g T0001.R and T0001.csv

mlandry22 commented 9 years ago

Awesome! I'll make sure and add some h2o.deeplearning configs in the thing that runs all weekend so we can see if that configuration can be beat. Don't let that stop you from doing the same. But that's a perfect utilization of what I want, and will be nice to let some H2O servers run all weekend.

JohnM-TX commented 9 years ago

Guess who's in the top 10, boys? Us!! Code is deposited in the folder. I'll be taking a break until Sunday night for family time and won't be submitting for a few days. Plus, it's time for me to take a break from the terminal and view the problem from a distance for a short while. Have a great weekend!

mlandry22 commented 9 years ago

Awesome!! Ok, I said awesome twice. But adding deep learning to the mix and whatever John just did to get us up 12 spots is great. Will take a look at what it was.

ThakurRajAnand commented 9 years ago

WOW

mlandry22 commented 9 years ago

Have a great weekend, John. I've seen the code and now I see the following notes in your pair of submissions:

Fri, 30 Oct 2015 04:16:50
more features 95/5 blend with sample non-tuned model
xgb-10-29-4.csv     
23.74293
...
Thu, 29 Oct 2015 03:47:52
added new features 78/22 blend with sample
xgb-10-27-4.csv     
23.76872

So this is looking great. We have an R gbm, XGBoost GBM, and H2O deep learning, all closing in on similar independent scores, so we are in good shape for some nice variance in our models. At some point, we'll probably want to look into stacking these: learning weights, rather than the guess-and-check simple average method we're doing (which is fine for now).

JohnM-TX commented 9 years ago

Hi guys - I made a new submission tonight with better local CV but it did not do well on the LB. It was the same XGb model, with one bug fix to a calculated field, and blended 85/15 or so with all zeros (essentially scaling down the values.) I thought it would do better, but no.

I'll look back to features and see if there are gains to be made there unless there is another avenue we need to explore.

JohnM-TX commented 9 years ago

Did some work on features, cutpoints, etc. and got .10-.15 gain locally but nothing on the LB. Finally got some small gain with an ensemble of the two best xgb models. I'm thinking this branch of development is about played out and will spend my next effort trying to identify some of those outliers that contribute to the bulk of MAE.

JohnM-TX commented 9 years ago

Hi guys. Do either of you have plans for the second submission today? If not, I'll use it but no problem waiting til later either.

On Fri, Oct 30, 2015 at 12:25 AM, Mark Landry notifications@github.com wrote:

Have a great weekend, John. I've seen the code and now I see the following notes in your pair of submissions:

Fri, 30 Oct 2015 04:16:50 more features 95/5 blend with sample non-tuned model xgb-10-29-4.csv 23.74293 ... Thu, 29 Oct 2015 03:47:52 added new features 78/22 blend with sample xgb-10-27-4.csv 23.76872

So this is looking great. We have an R gbm, XGBoost GBM, and H2O deep learning, all closing in on similar independent scores, so we are in good shape for some nice variance in our models. At some point, we'll probably want to look into stacking these: learning weights, rather than the guess-and-check simple average method we're doing (which is fine for now).

— Reply to this email directly or view it on GitHub https://github.com/mlandry22/rain-part2/issues/3#issuecomment-152419623.

ThakurRajAnand commented 9 years ago

No plan from my side

mlandry22 commented 9 years ago

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)

JohnM-TX commented 9 years ago

No problem. Unfortunately my last two submissions were bricks! But, I might be onto something. Will post it as an issue and maybe you guys can build/ course correct when things settle down.

On Thu, Nov 5, 2015 at 3:59 PM, Mark Landry notifications@github.com wrote:

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)

Reply to this email directly or view it on GitHub: https://github.com/mlandry22/rain-part2/issues/3#issuecomment-154208773

ThakurRajAnand commented 9 years ago

Cool. Please post it whenever you are ready. I recently left job and was busy doing knowledge transfer, but now I am completely free. I will go through all the features you and Mark have created and will start running some models.

mlandry22 commented 9 years ago

Trying deep learning out now. Bigger models than Thakur listed here are not working at all. So I'm going back and starting with just what you had running. Could be my features aren't good as they are, too.

JohnM-TX commented 9 years ago

I tried more tinkering with features but couldn't get any improvements. Also ran through with a h2o RF model but still in the same range. I'll go back to looking for outliers to see if we have any luck there. In this strangely-placed post https://www.kaggle.com/c/how-much-did-it-rain-ii/forums/t/17317/just-curious-what-s-the-mouse-over-number-on-scores the comp admin seems to indicate that it's not possible to find the outliers in the test data.

JohnM-TX commented 9 years ago

So what do we want as an overall strategy for the last week? I made some (very) slight progress with the outlier detection tonight and can pursue that. I've tried more feature engineering with rainfall prediction, but hit many dead ends. One of the main challenges I've had is flaky validation. I'm doing a simple leave out 20% of the data which sometimes works sometimes doesn't. Open to suggestions...

Anyway, how should we proceed? I think with just some ensembling and simple tweaks we can move up a few spots and who knows, maybe we pull it together and jump back to top 10?

mlandry22 commented 9 years ago

The logistic regression isn't off to a great start. It bottomed out shortly after. I'm trying to figure out if it can still be informative, particularly when stacked with something like nlopt for MAE. But let's not hold our breath.

Therefore, back to John's question about the overall strategy. And I like the two concepts mentioned.

For flaky validation, I can provide some horsepower to run 5-fold, rather than a single 20%. We can compare validation scores across each and see how that works out.

That will also give us some ensembling direction. And with what we have, that's probably a good idea, though I know it didn't work out too well on the first attempt (again, I have had that same experience before, too).

Other ideas?

As it stands, I think perhaps we can share a list of folds so that we can consistently use the same validation sets? That way we can all consistently use the same first fold if that's all we want to start with, and have it be consistent.

JohnM-TX commented 9 years ago

Yes, I agree sharing validation sets would be a good move. Some interesting traffic today on the forum: https://www.kaggle.com/c/how-much-did-it-rain-ii/forums/t/16680/cross-validating Several others have had trouble with classification carrying over from train to test. There's something different about those days that throws us off. For outlier detection, I tried to stay away from "rainfall-related" variables and look at "site specific" variables like radardist, sd(Zdr) and such but still no luck.

Thakur - any insight from outlier detection or other pursuits? I could use some inspiration!

mlandry22 commented 9 years ago

Ok, so John, I'm going to use this as a basis for creating the folds, so that scores should be on par with what you're using, and then we can get 4 other sets to start using:

# create an interim validation set                          
idnumsv <- unique(trraw[, Id])
validx <-sample(1:length(idnumsv), length(idnumsv)/5)                     
valraw <- trraw[Id %in% validx, ] 
trraw <- trraw[!Id %in% validx, ] 
val <- collapsify(valraw)
val <- val[!is.na(val[, wref]), ]      
setDF(val)

That comes with a seed set up top which should ensure the sampling is consistent. I'll do that and post a 2-column table of IDs and folds. Then I'll try and adapt our best code and run them all on 5-fold.

Speaking of, I'll try and catch up with everything to answer this question myself, but our leading models are something like

XGBoost, as posted by John
DL as posted by Thakur
R/GBM as posted by Mark (slightly broken, btw, forgot to make an x2 from x)

Others I've overlooked as I was distracted for a few weeks?

ThakurRajAnand commented 9 years ago

Sorry for the delay but I am back. Have been bit busy with some other stuff. I will pick features created by you and John and start improving NN and will try to optimize GBM from sklearn directly on mae.

Since I was away, I will be happy if your guys assign something.

ThakurRajAnand commented 9 years ago

@JohnM914 instead of trying to find the outliers directly did you try to generate classification CV probabilities and use them as features in regression model. I am going to give one try to auto-encoders to do outlier detection after I leave office today.

JohnM-TX commented 9 years ago

I briefly went down the path with k-means but looking back probably didn't do it right. I drew up a schematic earlier today that I can try with classification feeding into regression. In theory it should work but it looks like many others have tried and failed. Probably because they're not as smart as us, right? Joking of course. I'll give it a go in the next 24 hours and see anyway. Brain is wearing out this evening - midnight my time...

On Tue, Dec 1, 2015 at 11:52 PM, Thakur Raj Anand notifications@github.com wrote:

@JohnM914 https://github.com/JohnM914 instead of trying to find the outliers directly did you try to generate classification CV probabilities and use them as features in regression model. I am going to give one try to auto-encoders to do outlier detection after I leave office today.

— Reply to this email directly or view it on GitHub https://github.com/mlandry22/rain-part2/issues/3#issuecomment-161188500.

JohnM-TX commented 9 years ago

Here is my latest on features that might predict outliers, which by my way of thinking are different than features to predict ranfiall. I tried to focus on features that would be specific to a site - radar calibration, geography, local interference, timing intervals, etc.

collapsify2 <- function(dt) { 
  dt[, .(expected = mean(Expected, na.rm = T)
    , bigflag = mean(bigflag, na.rm = T)
    , negflag = sum(negflag, na.rm = T)
    , records = .N    
    , timemean = mean(timespans, na.rm = T)
    , timesum = sum(timespans, na.rm = T)
    , timemin = min(timespans, na.rm = T)
    , timemax = max(timespans, na.rm = T)
    , timesd = sd(timespans,na.rm = T)
    , minssum = sum(minutes_past, na.rm = T)
    , minsmax = max(minutes_past, na.rm = T)
    , minssd = sd(minutes_past, na.rm = T)
    , zdrmax = max(Zdr, na.rm = T)
    , zdrmin = min(Zdr, na.rm = T)
    , zdrsd = sd(Zdr, na.rm = T)
    , kapsd = sd(Kdp*radardist_km, na.rm = T)
    , rhosd = sd(RhoHV, na.rm = T)
    , rhomin = min(RhoHV, na.rm = T)
    , rd = mean(radardist_km, na.rm = T)
    , refcdivrd = max((RefComposite-Ref)/radardist_km, na.rm = T) 
    , c1 = max(Zdr/Ref, na.rm = T)
    , c2 = max(RefComposite/Ref, na.rm = T)
    , c3missratio = sum(is.na(RhoHV))/.N
    , refmissratio = sum(is.na(Ref))/.N
    , refcmissratio = sum(is.na(RefComposite))/.N
  ), Id]
}

Here's what I got for feature importance. As mentioned, this did well locally but not with the test set.

mlandry22 commented 9 years ago

John, on that same idea, we might try to compute peculiar single readings. Like if rhomin is way out of range and everything else is in normal range. That's a hard feature for trees to find, despite what people claim about interactions. We'd probably want normalized values of each reading from the pre-collapsed set: (reading-colMean)/colSD And then something like getting the max value per ID divided by the mean value per ID (accent the outlier). There's a couple things to look for, and playing around with numbers can probably get a few such features.

Things you don't want this to flag:

IDs where most values are extremely low/high all the time
IDs where most values are extremely low/high for one or two timepoints

Things you would want this to flag:

IDs where one value is extremely low/high all the time
IDs where one value is extremely low/high one or two timepoints

Likely an important part about making this work efficiently is to make a single flag for all features; either:

scan all columns per ID and just store the max of outliers found
calculate them independently, but also create a feature that evaluates if any of the independent ones are flagged
sum up the independents; 0/1/2/3 can be found just as easily as 0/1 Doing either the second or third is easiest for debugging reasons and very easy to ignore the independent ones. But that single feature is useful as if this doesn't happen often, the trees can use the overall one with a lot more power than spreading it out amongst all the features.

Another thing that seems silly but worthwhile, if it's there, is whether they have internally inconsistent numbers. If the 10th is greater than the 50th or 50th greater than 90th. Not likely the case, but if it happens, it might be interesting to look at.

mlandry22 commented 9 years ago

...I think this verifies that no silly mistakes are out there.

t<-fread("train.csv")
t[!is.na(Ref_5x5_10th) & !is.na(Ref_5x5_50th),sum(Ref_5x5_10th>Ref_5x5_50th)]
t[!is.na(Ref_5x5_90th) & !is.na(Ref_5x5_50th),sum(Ref_5x5_50th>Ref_5x5_90th)]
t[!is.na(RefComposite_5x5_10th) & !is.na(RefComposite_5x5_50th),sum(RefComposite_5x5_10th>RefComposite_5x5_50th)]
t[!is.na(RefComposite_5x5_90th) & !is.na(RefComposite_5x5_50th),sum(RefComposite_5x5_50th>RefComposite_5x5_90th)]
t[!is.na(RhoHV_5x5_10th) & !is.na(RhoHV_5x5_50th),sum(RhoHV_5x5_10th>RhoHV_5x5_50th)]
t[!is.na(RhoHV_5x5_90th) & !is.na(RhoHV_5x5_50th),sum(RhoHV_5x5_50th>RhoHV_5x5_90th)]
t[!is.na(Zdr_5x5_10th) & !is.na(Zdr_5x5_50th),sum(Zdr_5x5_10th>Zdr_5x5_50th)]
t[!is.na(Zdr_5x5_90th) & !is.na(Zdr_5x5_50th),sum(Zdr_5x5_50th>Zdr_5x5_90th)]
t[!is.na(Kdp_5x5_10th) & !is.na(Kdp_5x5_50th),sum(Kdp_5x5_10th>Kdp_5x5_50th)]
t[!is.na(Kdp_5x5_90th) & !is.na(Kdp_5x5_50th),sum(Kdp_5x5_50th>Kdp_5x5_90th)]

mlandry22 commented 9 years ago

And then here is a simple way to get started with looking for the weird stuff:

library(data.table)

t<-fread("train.csv")
t[,zRef:=minutes_past*0]
t[,zRefComposite:=minutes_past*0]
t[,zRhoHV:=minutes_past*0]
t[,zZdr:=minutes_past*0]
t[,zKdp:=minutes_past*0]

t[!is.na(Ref),zRef:=scale(Ref, center = TRUE, scale = TRUE)[,1]]
t[!is.na(RefComposite),zRefComposite:=scale(RefComposite, center = TRUE, scale = TRUE)[,1]]
t[!is.na(RhoHV),zRhoHV:=scale(RhoHV, center = TRUE, scale = TRUE)[,1]]
t[!is.na(Zdr),zZdr:=scale(Zdr, center = TRUE, scale = TRUE)[,1]]
t[!is.na(Kdp),zKdp:=scale(Kdp, center = TRUE, scale = TRUE)[,1]]

t[,maxZ:=pmax(zRef,zRefComposite,zRhoHV,zZdr,zKdp)]
t[,meanZ:=(zRef+zRefComposite+zRhoHV+zZdr+zKdp)/5]
t[,meanNonZeroZ:=(zRef+zRefComposite+zRhoHV+zZdr+zKdp)/(1+ifelse(zRef==0,0,1)+ifelse(zRefComposite==0,0,1)+ifelse(zRhoHV==0,0,1)+ifelse(zZdr==0,0,1)+ifelse(zKdp==0,0,1))]
t[,maxAbsZ:=pmax(abs(zRef),abs(zRefComposite),abs(zRhoHV),abs(zZdr),abs(zKdp))]
t[,meanAbsZ:=(abs(zRef)+abs(zRefComposite)+abs(zRhoHV)+abs(zZdr)+abs(zKdp))/5]
t[,meanAbsNonZeroZ:=(abs(zRef)+abs(zRefComposite)+abs(zRhoHV)+abs(zZdr)+abs(zKdp))/(1+ifelse(zRef==0,0,1)+ifelse(zRefComposite==0,0,1)+ifelse(zRhoHV==0,0,1)+ifelse(zZdr==0,0,1)+ifelse(zKdp==0,0,1))]
t2<-t[,.(maxRatio=max(ratioMaxAbs_meanAbs),Expected=mean(Expected)),Id]
t2[,.(median=median(Expected),mean=mean(Expected),meanCapped=mean(pmin(Expected,50)),.N),round(maxRatio,1)][order(round)]

But doing all that doesn't seem convincing at first:

   round    median        mean meanCapped      N
 1:   0.0 0.7620004 326.2158641 12.6169167 410096
 2:   1.3 0.5080003   0.7257147  0.7257147      7
 3:   1.4 0.5080003  87.5228872  8.8886983    109
 4:   1.5 0.5080003 172.9487308  7.5987424   3490
 5:   1.6 0.7620004  63.0267645  4.9785835  25942
 6:   1.7 1.0160005  56.0237884  4.9542465  16917
 7:   1.8 1.0160005  47.6335496  4.8691934  15032
 8:   1.9 1.0160005  48.6061699  5.0389237  10765
 9:   2.0 0.7620004  75.2349484  5.7090180  36310
10:   2.1 1.0160005  41.9078220  4.9771725  18475
11:   2.2 1.0160005  41.8893130  4.8625946  19920
12:   2.3 1.0160005  40.0369639  4.7955911  19967
13:   2.4 0.7620004  39.2674235  4.3239085  19865
14:   2.5 1.0160005  37.2729321  4.5524655  34591
15:   2.6 1.0750006  27.6358265  4.5071043  37173
16:   2.7 1.2700007  23.6880710  4.6437057  48376
17:   2.8 1.2700007  23.7996848  4.7349457  44136
18:   2.9 1.2700007  24.2110894  4.6691191  66051
19:   3.0 1.2700007  18.1062962  4.4207018  37647
20:   3.1 1.2700007  20.9359522  4.2279012  32788
21:   3.2 1.2700007  19.5954574  4.2232663  31539
22:   3.3 1.2700007  18.2001923  4.1939458  30618
23:   3.4 1.2700007  15.8351303  3.8901064  29082
24:   3.5 1.2700007  17.1309199  3.9404645  27212
25:   3.6 1.2700007  16.4194614  3.8531756  24567
26:   3.7 1.0160005  18.8579310  3.9361624  22868
27:   3.8 1.2700007  12.7421878  3.8746778  19899
28:   3.9 1.0160005  12.8914775  3.7394227  17744
29:   4.0 1.2700007   9.4873132  3.7121392  14757
30:   4.1 1.2700007  14.7490870  3.7254200  12581
31:   4.2 1.2700007  12.2639715  3.5607097  10848
32:   4.3 1.2700007  14.3822789  3.6191568   9120
33:   4.4 1.2700007  18.9320663  4.0976536   7676
34:   4.5 1.2650007  18.3044052  3.7352882   6296
35:   4.6 1.2700007  14.6096468  3.6498686   4871
36:   4.7 1.0160005  16.4558360  3.7568519   3906
37:   4.8 1.2700007  16.3944929  3.8032243   2927
38:   4.9 1.2700007  12.0857711  4.0340287   2205
39:   5.0 1.2700007   5.3350087  3.4537775   1540
40:   5.1 1.2700007  17.3513730  4.2926483   1088
41:   5.2 1.0750006  38.6896190  4.4416716    761
42:   5.3 1.2700007  23.8237763  4.3680016    537
43:   5.4 1.0160005   3.8097636  3.0975536    344
44:   5.5 1.5240008   7.0909679  4.4143251    167
45:   5.6 1.0160005  92.3979459  5.2137943     87
46:   5.7 0.6350003   1.3112734  1.3112734     44
47:   5.8 3.5560020   3.9793354  3.9793354      3
48:   5.9 0.2540001   0.2540001  0.2540001      1
    round    median        mean meanCapped      N

mlandry22 commented 9 years ago

Not an elegant way of getting it done, but here is a way to get folds that should be consistent with what John has been doing. John's data should check out with fold 0.

I'll try and get the CSV zipped and posted, but might email it. Here is R code to get it.

library(data.table)
library(readr)
library(dplyr)
set.seed(333)

trraw <- fread("train.csv", select = selection)
idnumsh <- unique(trraw[, Id])

s<-sample(1:length(idnumsh))
s2<-floor(s/((1+length(idnumsh))/5))
foldTable<-as.data.frame(idnumsh)
foldTable$fold<-s2
colnames(foldTable)[1]<-"Id"
write.csv(foldTable,"foldTable.csv",row.names=F)

mlandry22 commented 9 years ago

I'm running my code through those five folds now. I won't have an opportunity to submit by today's deadline. But I ought to have all five models done with plenty of time for tomorrow's.

JohnM-TX commented 9 years ago

Cool. I may have something in time. Setting up my outliers detector and classifier now.

mlandry22 commented 9 years ago

It's nice that you've been pushing the best single model. I was going to get an ensemble set up in case we didn't have two for the deadline.

Are either of you using the Marshall Palmer directly? It's likely not a bad feature, but I don't think I'm using it.

mlandry22 commented 9 years ago

Well I went ahead and tried the ensemble given that it seems like we have either 0 or 1 other submissions today. It helped, but not much. As expected, mostly.

JohnM-TX commented 9 years ago

At least it's some progress. I ran the classifier based on flags along the lines of what Mark described. It worked very well on the validation set but as before, it did not do any good on the test set.

I'll finally leave this avenue and go back to trying probability matching. I didn't get any boost fitting to a standard gamma distribution, but an ensemble approach might work. The woman who wrote this paper seems to think it has merit. Ebert_2001_PME.pdf

JohnM-TX commented 9 years ago

Mark - I used rates calculated by Marshall-Palmer as a feature for xgboost and it was ranked #1 for feature importance. I used this modified version after trying different parameters in a direct calculation of MAE on the training data.

ratemm  = ((10^(Ref/10))/170) ^ (1/2)   
precipmm = sum(timespans * ratemm, na.rm = T)  # grouped by Id

where timespans is the fraction of the hour from the previous reading to the current reading associated with the Ref value.

mlandry22 commented 9 years ago

OK, got the folds run last night. But I might have underestimated the variance in what we all aim to predict as well. The ID code I sent should be universal, so no exclusions. But John capped at 50, I think I saw, I had previously capped at 70, I think. So we'll be covering different spaces a little bit. I suppose if we do any stacking, we'll only do it on those where we have full predicitons, and that's likely the best anyway.

So that said, here is a statement regarding the variance of my folds, which is fairly large, I think:

1: 0.2035 2: 0.2067 3: 0.2001 4: 0.2031 5: 0.2014

So shifting at the third decimal place, which isn't too bad--we just bumped our score up by a similar amount of the variance here and it was only worth one spot.

Tough to know what to make of it, but I can provide predictions for the full holdout set I was using and then the test set.

mlandry22 commented 9 years ago

For a little inspiration, the amount we moved on our last submission is almost 1/4 of what we need to get 10th, as it stands. So it's certainly achievable. Now we just need some clue of how to do it 4.5 more times ;-)

mlandry22 / rain-part2

Submission Management #3

model <- h2o.deeplearning( x=2:22, y=1, training_frame = x1, activation = "RectifierWithDropout", hidden = c(10,10), input_dropout_ratio = 0.1, hidden_dropout_ratio = c(0.1,0.1), loss = "Absolute", epochs = 10 )

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)