microsoft / BatteryML

MIT License
480 stars 101 forks source link

Can't get access to public datasets #1

Closed alexanderkell closed 1 year ago

alexanderkell commented 1 year ago

When running the /baseline.ipynb notebook on cell 2 where we set up the Pipeline, I get the following AssertionError:

File [~/Documents/GitHub/BatteryML/src/train_test_split/base.py:25](https://file+.vscode-resource.vscode-cdn.net/Users/macuser/Documents/GitHub/BatteryML/~/Documents/GitHub/BatteryML/src/train_test_split/base.py:25), in BaseTrainTestSplitter.__init__(self, cell_data_path)
     23 for path in cell_data_path:
     24     path = Path(path)
---> 25     assert path.exists(), path
     27     if path.is_dir():
     28         self._file_list += list(path.glob('*.pkl'))

AssertionError: data/processed/MATR

I guess this is because the public datasets haven't been uploaded to GitHub. Is there a download script I am missing somewhere?

fingertap commented 1 year ago

Download the MATR datasets from their website, place it under data/raw and run scripts/preprocess.py. I think this preprocess script currently assumes we have downloaded all datasets. We may add a file exist check to skip those missing datasets.

alexanderkell commented 1 year ago

Excellent, thanks, that worked!

Although, I had to comment out the following line: https://github.com/microsoft/BatteryML/blob/e0ba39c899da892180d1751971aba19dd8a3bf99/scripts/preprocess_scripts/preprocess_MATR.py#L20

because I couldn't find a download link to the 2019-01-24_batchdata_updated_struct_errorcorrect.mat file.

fingertap commented 1 year ago

The last batch can be found here.

alexanderkell commented 1 year ago

Thank you!