usnistgov / alignn

Atomistic Line Graph Neural Network https://scholar.google.com/citations?user=9Q-tNnwAAAAJ&hl=en https://www.youtube.com/watch?v=WYePjZMzx3M
https://jarvis.nist.gov/jalignn/
Other
230 stars 83 forks source link

warning message #163

Open LTJer opened 3 months ago

LTJer commented 3 months ago

fatal: not a git repository (or any parent up to mount point /gpfs) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). world_size 0 root_dir ./ id_prop_csvfile exists True len dataset 351 Using LMDB dataset. MAX val: 206.9750005 MIN val: 11.04244288 MAD: 9.425622709805278 Baseline MAE: 8.186923942137753 data range 206.9750005 11.04244288 100%|██████████| 280/280 [04:02<00:00, 1.15it/s] data range 27.18234673 12.03331602 100%|██████████| 35/35 [00:12<00:00, 2.70it/s] data range 89.94074813 14.72680096 100%|██████████| 35/35 [00:18<00:00, 1.85it/s] **/u/au/sa/liutsungwei/scratch/conda/envs/my_alignn/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " /u/au/sa/liutsungwei/scratch/conda/envs/my_alignn/lib/python3.10/site-packages/torch/nn/modules/loss.py:101: UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1loss(input, target, reduction=self.reduction)**

The bold section is the warning message. Do I need to worry about that, and how can I fix it?

Thanks a lot.

bdecost commented 2 months ago

there are two UserWarnings here - the first one about the learning rate scheduler I think is a holdover from the behavior we had on an earlier version that used PyTorch ignite for the training loop - it's probably worth us fixing (swap this line and the one that follows https://github.com/usnistgov/alignn/blob/main/alignn/train.py#L467) but it's not the most worrisome thing since it only affects the initial optimization step

the second UserWarning is probably more concerning and could be a correctness issue. it looks like you have a batch size of 1 (?) and you have a scalar regression task? Are you able to test if you still get this warning if you use a batch size of 2 or more? It could be that the model has a call to torch.squeeze that removes the batch dimension by mistake if the batch size is 1...