usnistgov / alignn

Atomistic Line Graph Neural Network https://scholar.google.com/citations?user=9Q-tNnwAAAAJ&hl=en
https://jarvis.nist.gov/jalignn/
Other
214 stars 80 forks source link

RuntimeError: The size of tensor a (536) must match the size of tensor b (300) at non-singleton dimension 1 #162

Open yuuhaixia opened 1 month ago

yuuhaixia commented 1 month ago

My regression output is 536 values, how can I change the code to fix the mismatch?

image

bdecost commented 1 month ago

hi - could you please share your configuration file?

if you have a 536-dimensional vector-valued prediction task, you should set output_features to 536. I'm guessing that 300 is coming from maybe the edos_pdos example?

yuuhaixia commented 1 month ago

Here is my run command and config file, it has ‘output_features’: 536, where am I going @bdecost

train_alignn.py --root_dir "alignn/examples/gtj" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp88

image image

bdecost commented 1 month ago

ok, my assumption about the example configuration seems to have been wrong, sorry.

I have a couple followup questions. From what you wrote, you are setting --config "alignn/examples/sample_data/config_example.json" but you've shared the contents of temp88/config.json -- is that temp88/config.json automatically generated by the training script, and you have modified the example configuration file to have output_features=536 also?

Second (maybe more relevant) question - in your debugging output you have dats[2].shape == target.shape == [1, 300]. Are input and target the inputs to the PyTorch loss function, so that input.shape == [1, 536] is the output of your model? If that's the case, you should double check that your dataloader is loading the correct targets - it seems like you are correctly setting the output dimension of the model but the target dimension does not seem to match.