tomerm / MLClassification

Classification using ML approach for English / Hebrew / Arabic data sets
1 stars 2 forks source link

For your notice #36

Open matanzuckerman opened 5 years ago

matanzuckerman commented 5 years ago

I added random.shuffle in line 101. without the shuffeling we saw the problem of the discrepancies between the jupyter notebook and the script.

Thanks

semion1956 commented 5 years ago

You added shuffling for the case, where test dataset is determined as a part of full data set. In jupyter notebooks this case was not considered. Note, in case of cross-validation (which is more or less similar) shuffling is performed. Do you actually work with splitting full data set? I suggest to use one-time cross validation for this goal.

matanzuckerman commented 5 years ago

@semion1956 Hi There is a chance I will split full dataset to train-test. Usually I will do cross-validation but without the shuffling it won't work.

semion1956 commented 5 years ago

@matanzuckerman Hi. I only want to note, that cross-validation is started from merging train and test data sets and shuffling of resulting "full" data set (of course, without actual changes in original data)