igrigorik / decisiontree

ID3-based implementation of the ML Decision Tree algorithm
1.44k stars 130 forks source link

Question about real usage #36

Closed igorkasyanchuk closed 7 years ago

igorkasyanchuk commented 7 years ago

Hello, This is not an issue, it's more like a question about real usage. For example, I would like to use this gem on my projet. I've basically training data like this

Header: Client, Worker, Technology, Status Data with past Requests: Client A, John, Ruby, Won Client A, John, Ruby Won Client B, Bob, Java, Won Client B, John, Ruby, Lost Client C, John, HTML, Lost Client C, Alice, HTML, Won Client C, Bob, HTML, Lost ....

Now, what I want to do when new Request I want to advice who is the best worker for it. For example, what if I assign "Bob" for new Client what the chance that status will be "Lost".

I hope you got the idea.

How many records do we need to have? What to do if we Request has many technologies? Duplicate rows?(with same client, status?)

Thanks

igrigorik commented 7 years ago

Take a look at the intro blog post, that should cover the basics: https://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/

In terms of training, the more data the better. When you test your model, make sure to avoid overfitting; look into using cross-validation: https://en.wikipedia.org/wiki/Cross-validation_(statistics)