usnistgov / alignn

Atomistic Line Graph Neural Network
238 stars 86 forks source link

warning message #163

Open LTJer opened 3 months ago

LTJer commented 3 months ago

fatal: not a git repository (or any parent up to mount point /gpfs) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). world_size 0 root_dir ./ id_prop_csvfile exists True len dataset 351 Using LMDB dataset. MAX val: 206.9750005 MIN val: 11.04244288 MAD: 9.425622709805278 Baseline MAE: 8.186923942137753 data range 206.9750005 11.04244288 100%|██████████| 280/280 [04:02<00:00, 1.15it/s] data range 27.18234673 12.03331602 100%|██████████| 35/35 [00:12<00:00, 2.70it/s] data range 89.94074813 14.72680096 100%|██████████| 35/35 [00:18<00:00, 1.85it/s] **/u/au/sa/liutsungwei/scratch/conda/envs/my_alignn/lib/python3.10/site-packages/torch/optim/ UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " /u/au/sa/liutsungwei/scratch/conda/envs/my_alignn/lib/python3.10/site-packages/torch/nn/modules/ UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1loss(input, target, reduction=self.reduction)**

The bold section is the warning message. Do I need to worry about that, and how can I fix it?

Thanks a lot.

bdecost commented 3 months ago

there are two UserWarnings here - the first one about the learning rate scheduler I think is a holdover from the behavior we had on an earlier version that used PyTorch ignite for the training loop - it's probably worth us fixing (swap this line and the one that follows but it's not the most worrisome thing since it only affects the initial optimization step

the second UserWarning is probably more concerning and could be a correctness issue. it looks like you have a batch size of 1 (?) and you have a scalar regression task? Are you able to test if you still get this warning if you use a batch size of 2 or more? It could be that the model has a call to torch.squeeze that removes the batch dimension by mistake if the batch size is 1...