Closed pbenner closed 1 month ago
Maybe the above mentioned solution needs some refinement. The line
probably reads the full .xzy file into memory, which should be prevented.
Possible solution implemented in my fork: https://github.com/carlosmada22/equitrain
Files changed: equitrain/preprocess.py equitrain/mace/data/utils.py
Generated files after running tests/test_preprocess.py over the test file tests/data.xyz: statistics.json (These are the h5 files not supported here, just remove the '.txt' from the name) train.h5.txt valid.h5.txt
Need for further analysis, as I am not sure if the results are good or not.
Converting large .xyz files to hdf5 currently fails because the entire file is loaded into memory. There is a fix for this in this branch: https://github.com/chiang-yuan/mace/tree/dev/large-dataset