sauravraghuvanshi / Udacity-Computer-Vision-Nanodegree-Program

This repositary contain all my exercises and projects of Udacity Computer Vision Nanodegree Program
52 stars 24 forks source link

Split the data into training and testing #31

Open vidhi-mody opened 4 years ago

vidhi-mody commented 4 years ago

80, 20 would be a good ratio

ankurbhatia24 commented 4 years ago

Suggestion: While performing the train test split use a seed so that when you rerun the code, you get the same splitting. Also, see if the function you are using to split has the option of stratifying the data. If you use sklearn, then it gives you that option. Stratification is necessary while splitting the data in multiclass classification because there may be a possibility that while splitting the majority of some class goes into test/train and hence the opposite (train/test) do not have the appropriate samples of that particular class. Stratification makes sure that the data distributions in both train and test remain the same. You can go through this blog for a detailed understanding: https://towardsdatascience.com/3-things-you-need-to-know-before-you-train-test-split-869dfabb7e50

deepeshgarg09 commented 4 years ago

I would like to work on this issue !

vidhi-mody commented 4 years ago

@deepeshgarg09 sure!