pbenner / equitrain

Generic training script for Equiformer
3 stars 1 forks source link

Implement large file support #1

Closed pbenner closed 1 month ago

pbenner commented 5 months ago

Converting large .xyz files to hdf5 currently fails because the entire file is loaded into memory. There is a fix for this in this branch: https://github.com/chiang-yuan/mace/tree/dev/large-dataset

pbenner commented 5 months ago

Maybe the above mentioned solution needs some refinement. The line

https://github.com/chiang-yuan/mace/blob/cf562584874b7c277edd05da9b6b54e56b9d8932/mace/cli/preprocess_data_mpi.py#L168

probably reads the full .xzy file into memory, which should be prevented.

carlosmada22 commented 2 months ago

Possible solution implemented in my fork: https://github.com/carlosmada22/equitrain

Files changed: equitrain/preprocess.py equitrain/mace/data/utils.py

Generated files after running tests/test_preprocess.py over the test file tests/data.xyz: statistics.json (These are the h5 files not supported here, just remove the '.txt' from the name) train.h5.txt valid.h5.txt

Need for further analysis, as I am not sure if the results are good or not.