learn-data-science / data-science-vanilla

The more I r̶e̶a̶d̶code, the more I acquire, the more certain I am that I know nothing.
0 stars 0 forks source link

🌟 Project Candidates #1

Open raynardj opened 1 year ago

raynardj commented 1 year ago

🌟 Possible Targets for Vanilla Events

Here we maintain a list of ideas. Each idea will be a comment starting with light bulb emoji 💡

PLEASE USE 👍🏻 TO VOTE IDEAS

raynardj commented 1 year ago

💡 build your own tree 🌳

Decision tree from scratch

raynardj commented 1 year ago

💡 your own PCA or T-SNE 🗺

image

raynardj commented 1 year ago

💡 Mel-Spectrogram 🎙

Build a function to transform audio wav to mel-spectrogram and another function to transform it back

image

raynardj commented 1 year ago

💡 Our own stock picker pipeline

image

lrthomps commented 1 year ago

💡 build your own tree 🌳

Decision tree from scratch

and I could leverage some code I created for histogram based decision trees at work?

raynardj commented 1 year ago

💡 build your own tree 🌳

Decision tree from scratch

and I could leverage some code I created for histogram based decision trees at work?

why not, you need MIT license to use it more comfortably? we can put in one

histogram based decision tree --- isn't that already in many library like even sklearn?

lrthomps commented 1 year ago

💡

💡 build your own tree 🌳

Decision tree from scratch

and I could leverage some code I created for histogram based decision trees at work?

why not, you need MIT license to use it more comfortably? we can put in one

histogram based decision tree --- isn't that already in many library like even sklearn?

Yup, but it was fun to implement anyway and we were going to re-implement in c++ to be faaaast

raynardj commented 1 year ago

💡

💡 build your own tree 🌳

Decision tree from scratch

and I could leverage some code I created for histogram based decision trees at work?

why not, you need MIT license to use it more comfortably? we can put in one histogram based decision tree --- isn't that already in many library like even sklearn?

Yup, but it was fun to implement anyway and we were going to re-implement in c++ to be faaaast

I'm pretty sure my c++ worthiness is less, I'm more of a Rust person.

RESPECT~~~

lrthomps commented 1 year ago

💡4. Download stock prices from your favorite online finance website over a period of at least three years. Create a dataset for testing portfolio selection algorithms by creating price-return vectors. Implement the OGD and ONS algorithms and benchmark them on your data. Introduction to Online Convex Optimization

elasticsearcher commented 1 year ago

I love all the projects here, but right now number 4 is my absolute favourite and I shamelessly encourage everyone to vote for it!! 🔥🔥🔥

For those who haven’t been reading the OCO textbook:

This project is both self-contained and super straightforward to implement, consisting of clearly demarcated tasks:

  1. Create an “online” dataset of real historical stock price data covering a period of at least 3 years, that will be used to simulate an online setting to test our online learning algorithm
  2. Create a separate, much smaller, “debug” dataset that we can use as canon fodder while developing and debugging our algorithms; this is optional but I think it’s more fun to separate the development and the “production” phases of the project
  3. Implement the general Online Gradient Descent algorithm
  4. Implement the Online Newton Step algorithm
  5. Benchmark both algorithms on the “production” dataset and make plots to report the results
raynardj commented 1 year ago

💡4. Download stock prices from your favorite online finance website over a period of at least three years. Create a dataset for testing portfolio selection algorithms by creating price-return vectors. Implement the OGD and ONS algorithms and benchmark them on your data. Introduction to Online Convex Optimization

well this is all just great.

My wife built something that can scrap financial data and analyze things in very simple way, and I asked can you make it more useful by add something that's beyond "asking chatgpt if this stock is going to rise". And we stuck there, so I guess your suggesting is right our answer. her homework

raynardj commented 1 year ago

@elasticsearcher u must be Andrew

tianyimasf commented 1 year ago

suggestion: MLP, ANN, Markov chain, reinforcement learning also if anyone knows probabilistic graphic model...

raynardj commented 1 year ago

suggestion: MLP, ANN, Markov chain, reinforcement learning also if anyone knows probabilistic graphic model...

good suggestions, can you make it more specific

eg.

Create MLP with well defined back-propagation in using numpy etc

and lead with 💡 so we can vote on it! 🌟

tianyimasf commented 1 year ago

💡 MLP with back-propagation and inference using numpy

tianyimasf commented 1 year ago

💡 2-Layer ANN with back-propagation and inference function using numpy

tianyimasf commented 1 year ago

💡 A hidden Markov model with an adjustable number of hidden states.

Training it with the Expectation Maximization algorithm, and empirically investigating applications using the Forward-Backward (sum-product) and Viterbi (max-product) algorithms. It'll accept commandline arguments for the path to the training data, the number of hidden units to use, and the maximum number of iterations of EM to apply. By default, it should simply “do EM on the dataset” and print out the overall likelihood at initialization and again after each iteration of EM. Evaluate accuracy when predicting “into the future”. You may calculate the accuracy when predicting the “next state”, averaged over all states in the training data. You may explore how the accuracy drops off when predicting t steps into the future. https://github.com/tianyimasf/sequence-hmm/blob/main/sequenceProject.pdf

tianyimasf commented 1 year ago

💡 RL with Q-learning -- training & prediction using numpy

tianyimasf commented 1 year ago

💡 RL with SARSA -- training & prediction using numpy

tianyimasf commented 1 year ago

suggestion: MLP, ANN, Markov chain, reinforcement learning also if anyone knows probabilistic graphic model...

good suggestions, can you make it more specific

eg.

Create MLP with well defined back-propagation in using numpy etc

and lead with 💡 so we can vote on it! 🌟

idk anything about probabilistic graphic model so I'll leave to others to suggest the details.