Closed AdityaGudimella closed 8 years ago
Good idea, maybe some people here would like to collaborate?
A Bayesian model could work quite well here. I'm sure there are hierarchies that could be exploited. Uncertainty is likely to play a big factor too.
I'm willing to help out too. Let's form a team on Kaggle. I thought so too. A couple of years ago there was apparently a guy on Kaggle who used to do really well with Bayesian methods. We should try something out too. My email id is aditya.gudimella@gmail.com Please mail me if you're interested in doing this competition together.
I would be interested. I have been off the grid re pymc3 for ages after relocating half way around the world.
I would be interested in as well.
I started a new repo and invitied you all to the kaggle
group with write-permissions to: https://github.com/pymc-devs/kaggle_grupo
chat room: https://gitter.im/pymc-devs/kaggle_grupo
I've been wanting to work with pymc3 for a while. Can I join also?
@ewharton added
Count me in also , if somewone interested drop me a message at alexandru130586@yandex.com
@alexandrudaia added.
Tnks!
please add me beyhangl@gmail.com
Please join https://gitter.im/pymc-devs/kaggle_grupo
@twiecki Was this group removed?
Yes, we've moved to a private repo, check the gitter chatroom.
Hi all I manage to see now the private repo , the question is how can I enter the chat room for this?Tnks!
Oops, I didn't realize I would delete the chatroom along with it. Can we add a new one for the new repo?
Yes, in case this is impossible we can make a slack channel for this issue.
:+1:
Ok, so should I setup a slack channel tommorow?
Yes.
On Wed, Jun 15, 2016 at 3:40 PM, Daia Alexandru notifications@github.com wrote:
Ok, so should I setup a slack channel tommorow?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1167#issuecomment-226297118, or mute the thread https://github.com/notifications/unsubscribe/AApJmAYHrbPrIBu9a2q-RLlKJ-xUrbJXks5qMFUUgaJpZM4IxPlh .
I made slack channel please send me your email so I can drop invitations at alexandru130586@yandex.com
Seems like the slack was deleted?
guys sorry for late reply it seems i wanted to delete my first bilboa channel and delelted accidentaly , will send invites as soon as i reach home , sorry for unpleasure
Hey Guys,
I worked on a team of 10 for the Kaggle DSBII challenge. It was very difficult keeping coordinated.
With that said, I've very interested in learning how use pymc3 for Kaggle. I tried a couple times unsuccessfully (never could get things to scale).
If there's a spot on the team, I'd be interested in joining.
The downside:
The upside:
It's a late request, so I'm totally cool if it's a no-go.
Walter (aka inversion)
On Tue, Jun 21, 2016 at 5:27 AM Daia Alexandru notifications@github.com wrote:
guys sorry for late reply it seems i wanted to delete my first bilboa channel and delelted accidentaly , will send invites as soon as i reach home , sorry for unpleasure
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1167#issuecomment-227424277, or mute the thread https://github.com/notifications/unsubscribe/AETcGVHV5CYnIQDWxOAZjCg0RAcr1y5qks5qN9imgaJpZM4IxPlh .
Hello Mr Inversion , I am having same issue with using PyMc for kaggle , in fact I have not managed to do any submission for kaggle using PyMc , I tried some submissions using usual ml things but with no succes because I ended up getting stuck with dimmensionality of data , disabling me to do experiments, and sampling gived me bad score meaning 0.51 on local evaluation and 0.58 on LB. I ended up averaging some public scripts until now :(
The Kaggle appears to be up and running, so closing as an issue.
Hi, I have used PyMC3 + Bayesian Logistic Regression extensively on the NIH seizure prediction contest on Kaggle. Making it work was VERY difficult compared with using sk-learn's pure logistic regression and theano has several limitations which I had reported.
I am willing to upload the notebook and corroborate/improve the code.
I Have started exploring Edward and Stan as well for Kaggle.
Thanks for the feedback. What in particular was making it difficult? Expressing the model in PyMC, or getting the model to fit. Happy to peek at the code.
Looks like the algorithm could not get started. I'd be interested to see if it worked with the current release and the NUTS sampler, which gets initialized using ADVI. It should be easy with such a simple model.
Were there missing data in either the predictors or the outcomes? That will cause problems if not explicitly dealt with.
Hi, thanks, no missing data at all, everything is numerical, the data set was uploaded to github (hdf format) if you want to inspect it. Other than that, i would love to know how to provide the output from regular LogisticRegression() as an input to the MAP starting weights. Next week I am presenting PyMC3 and PyStan in PyData Tel-Aviv, it would be great if you can go over the code/amend/comment etc. Thanks!
Do you have data you can share with me? I can tinker with it and see what I model an do.
Hi Chris, yes, the data is here: https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/tree/master/data/output
The training set is here: https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/data/output/feat_train/train_allX_df_train.hdf It is based on features I generated from the IEEG signals.
For reference, there is also a full corresponding pipeline in XGBOOST here: https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/tree/master/jupyter/xgboost
If you amend anything, I will of course credit you.
@fonnesbeck please also see our previous discussion about this here: https://github.com/pymc-devs/pymc3/issues/840
@QuantScientist Have a look here. What I've done is:
Let me know if you have questions.
@fonnesbeck thanks a lot for the modifications! I used your revised version and uploaded a fresh notebook here: https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/jupyter/bayesian-logistic-regression/ieegPymc3Version1.ipynb
However, I must be doing something wrong since the score I get is quite bad compared to sk-learn logistic regression.
Maybe I am doing something wrong with the intercept since I was not using one before:
Theta includes an Intercept term which was added by Chris, so we have to adjust for it here:
'
def fastPredict(new_observation, theta):
v = np.einsum('j,j->',new_observation, theta[1:theta.size])
return expit(w_intercept + v)
`
That's interesting. I didn't do any optimization of the model, so perhaps not too surprising. So, I've used an L1 norm with a particular (arbitrary) lambda of 0.1. What if you used the same setup as sklearn does -- looks like ridge regression with lambda of 0.01?
Hi Thomas. I found that the test data set was not normalized so I normalized it as you did for the training set. However, upon submission, the score remains the same.
Updated notebook is here: https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/jupyter/bayesian-logistic-regression/ieegPymc3Version1.ipynb
Thanks,
Is anyone interested in using pymc3 for a kaggle competition? It's a really good way to showcase the abilities of the library. The devs of the keras library did the same thing with keras. There's a good competition up on kaggle right now. Grupo Bimbo Inventory Demand