petsel / twitter-figure

Hamburg Hackaton (#hackathonHH) Frontend Hack Repro - working result can be seen at http://petsel.github.io/twitter-figure/
3 stars 0 forks source link

Machine learning scrips available? #1

Closed andygrunwald closed 8 years ago

andygrunwald commented 10 years ago

Hey,

at https://www.hackerleague.org/hackathons/hamburg-geekettes-and-open-tech-school-hamburg-hackathon/hacks/twitterfigure you describes that Python and R were used. If i get this right this are the scripts for the machine learning, right?

Are they scripts public available? I want to try to get into machine learning and want to understand how it works from raw data source (twitter?) up to your analysis (the complete way).

p3t3r67x0 commented 10 years ago

Hey Andy your right we used Python and R for our machine learning algorithms, just forwarded your issue to my group-mates. Hopefully they'll answer your questions asap. Regards Aurelius!

petsel commented 10 years ago

Hi Andy, I wrote an email to you and Tim Dettmers so you both can get in contact directly.

petsel commented 10 years ago

in case anyone else in public is interested, here is the answer Tim gave in response to Andy 's opening question:

Hey Andy,

Thanks for your interest in our project and interest in machine learning – it is always great to see people getting excited about machine learning.

If you want to twiddle with twitter data and the algorithms that I used I recommend you getting started with the Kaggle crowdflower competition data, which is very similar to the task we did at the hackathon. The advantage is that you have a clean data set and can fit algorithms right away. You can find the Kaggle crowdflower competition and its data here: http://www.kaggle.com/c/crowdflower-weather-twitter

There are some linear algorithms which I used both in the crowdflower competition and in the hackathon which you can find here: https://github.com/TimDettmers/crowdflower. The script you find there is quite general, so that you can fit different models such as ridged regression, random forests, support vector machines, naïve Bayes, etcetera, etcetera. The script also allows to build an ensemble to decrease the error.

I also used a more specialised deep neural network for both tasks. If you want to do this you will need a NVIDIA GPU with sufficient memory (4GB for crowdflower (I used 6GB for the hackathon where I trained the model on my server at home)). Also have a version which runs on a CPU but training time will be many days which pretty much ruins the learning experience, so working with neural networks is only really recommended if you have a good GPU. You can find my deep neural network code here: https://github.com/TimDettmers/deepnet

If you have any troubles feel free to write me an email.

Best wishes, Tim

andygrunwald commented 8 years ago

Thank you @webtobesocial and @petsel. I think it answers my question quite well. thx!