timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.19k stars 649 forks source link

learn.save_all(path) error for TST in latest version #443

Closed strakehyr closed 2 years ago

strakehyr commented 2 years ago

Hi all, I ran into a new error when trying to save a pretty big TST model, therefore I think this issue might be memory-related, as I have saved models before on this version. My X shape is (18372, 69, 192), with a y shape of (18372, 28, 192).

learn.save_all(save_path)

   Traceback (most recent call last):

     File "C:\Users\user\Anaconda3\lib\site-packages\torch\serialization.py", line 379, in save
       _save(obj, opened_zipfile, pickle_module, pickle_protocol)

     File "C:\Users\user\Anaconda3\lib\site-packages\torch\serialization.py", line 486, in _save
       zip_file.write_record('data.pkl', data_value, len(data_value))

   OSError: [Errno 22] Invalid argument

   During handling of the above exception, another exception occurred:

   Traceback (most recent call last):

     File "C:\Users\user\AppData\Local\Temp/ipykernel_22812/3108202325.py", line 1, in <module>
       learn.save_all(save_path)

     File "C:\Users\user\Anaconda3\lib\site-packages\tsai\learner.py", line 60, in save_all
       torch.save(dl, path/f'{dls_fname}_{i}.pth')

     File "C:\Users\user\Anaconda3\lib\site-packages\torch\serialization.py", line 380, in save
       return

     File "C:\Users\user\Anaconda3\lib\site-packages\torch\serialization.py", line 259, in __exit__
       self.file_like.write_end_of_file()

   RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:300] . unexpected pos 64 vs 0
oguiza commented 2 years ago

Hi @strakehyr, you are right. save_all and load_all methods were designed for small datasets/ models only. For larger ones, you should use learner.save and load_learner. I've updated the documentation to reflect this point.