timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
4.91k stars 622 forks source link

Split plot refinement (accurate labeling) #876

Open tbohne opened 5 months ago

tbohne commented 5 months ago

I used get_splits(train_labels, valid_size=.2, stratify=True, random_state=23, shuffle=True). In this case, I would expect the second label to be "Valid" instead of "Test". I'm not specifying any test split and by default it is zero. I am, however, specifying a valid_size, which is why the labels should be "Train" and "Validation", not "Test", but it looks like this: image

I made a small change to plot_splits() to change the behavior to my needs. For some reason, it assumed that one split, i.e., two lists, always means (train, test). I realized that validation data is not optional in the split generation function, so I assumed it as mandatory. So the combination of only "Train" and "Test" is not possible.

Behavior now:

This is a reasonable labeling behavior in my opinion (under the assumption that validation data is mandatory). I also set a default value for the new parameter in plot_splits() so that it doesn't cause any compatibility issues.

review-notebook-app[bot] commented 5 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB