Open vidhi-mody opened 4 years ago
Suggestion: While performing the train test split use a seed so that when you rerun the code, you get the same splitting. Also, see if the function you are using to split has the option of stratifying the data. If you use sklearn, then it gives you that option. Stratification is necessary while splitting the data in multiclass classification because there may be a possibility that while splitting the majority of some class goes into test/train and hence the opposite (train/test) do not have the appropriate samples of that particular class. Stratification makes sure that the data distributions in both train and test remain the same. You can go through this blog for a detailed understanding: https://towardsdatascience.com/3-things-you-need-to-know-before-you-train-test-split-869dfabb7e50
I would like to work on this issue !
@deepeshgarg09 sure!
80, 20 would be a good ratio