midterm report review - Githubissues

Nice description of the variables! It is clear from the writing which are the independent variables for your experiment, and which is the dependent variable (Bags Sold). You also included informative descriptions for each of the independent variables. This helps because not all of the variables have self-explanatory names. You have decided to use k-fold cross validation for your project. This is a great decision as this will reduce the chances of overfitting. In Understanding the Data, you have included several useful observations. I think that these observations could turn out to be more useful as you experiment more with your data.

Perhaps, along with the description of the variable, you could have written about your hypothesis on how each of the variables could affect the number of bags sold. Along with that hypothesis, you could have performed simple tests to see the correlation between each of the variables with the output. I don’t quite understand the chart that describes the 5-fold cross validation. Each of the folds should have the same amount of training set and test set. However, it is not clear to me why Fold 1 has only 2009 as the training set, whereas Fold 5 has the years 2009-2013. I might be misunderstanding something, but to me it seems as if the amount of training set is increasing as k increase in Fold k. In the section Understanding the Data, (3), (4), and (5) could have more detailed explanations. First, for (3) and (4), how did you see these observations? Did you notice the trend by using a graph? If so, it would be very helpful to include that graph. For (5), why is this observation important to point out? What is the significance of class, grad and bags sold having -1 values? If these instances do not have much meaning, should these rows be cleaned out of the data?

wangzilongri / SoybeanProject4741

midterm report review #9