materialsvirtuallab / m3gnet

Materials graph network with 3-body interactions featuring a DFT surrogate crystal relaxer and a state-of-the-art property predictor.
BSD 3-Clause "New" or "Revised" License
231 stars 59 forks source link

Training is slower without stress? #28

Open VvVLzy opened 2 years ago

VvVLzy commented 2 years ago

I have been using two datasets to train the model based on the pre-trained one. They are pretty similar in size, one without stress and the other with stress.

I notice that, using the same device configuration, the model trains much slower on the dataset without stress. It even runs out of memory after 2 epochs when using batch_size=32. I have to decrease the batch size to 16 to continue training.

The training speed for the dataset with stress is ~130ms/step with batch size of 32. The training speed for the dataset with stress is ~270ms/step with batch size of 16.

I wonder what might be causing this factor of 4 slower in speed?

chc273 commented 2 years ago

Could you show a minimally reproducible script with some dummy data? @VvVLzy

VvVLzy commented 2 years ago

Just to clarify: are you asking for the script I used for training as well as the two sets of training data?

chc273 commented 2 years ago

Yes, it would be helpful for checking where is the problem. It does not happen on my machines

VvVLzy commented 1 year ago

Here are the scripts and dummy data for the slow and fast training (each in the corresponding folder). The slower training (without stress) uses monty to load/parse the data file (so the data file format is a bit different). However, the parsed data fed into the trainer is of the same format, so it should not affect training speed in that regard...

Both data files consist of 1000 training examples and 100 validation.

Thanks.

slow.zip fast.zip