Implement different train-test splits (seasonal variation and old data bias)

linaskerath commented 1 year ago

Another bias that we might be imposing onto the model is the seasonal variation of melt data. In this project, we only look into two months’ worth of data, so the model should not be used to predict data outside those months. An example of poor performance is already noticeable in data that is a month beyond the training data. In appendix A.1.3, we see two prediction plots - one plot with predictions for 2019- 07-15 and one with predictions for 2022-07-31. The RMSE of the respective days differs substantially. The reason for the worse performance of the latter day is due to increased temperatures towards the end of the month which caused more ice melt. The model cannot capture the extended melt because the training data did not have such examples. In future work, as we begin to work with more data, we must be careful not to bias the model with seasonally unbalanced data by overrepresenting certain months.

We also have to take care to not use too old data as the situation of the ice sheet is constantly changing and can have significant alterations throughout the years.

linaskerath commented 1 year ago

Need to experiment with different train- test splits and samples. Discussed in meeting #6

linaskerath commented 1 year ago

instead just have one validation leave-out set.

linaskerath / RP_Greenland

Implement different train-test splits (seasonal variation and old data bias) #36