rainmaker22 / SMART

[NeurIPS 2024] SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction
Apache License 2.0
76 stars 12 forks source link

Error in validation phase in training #5

Closed zachytong closed 2 months ago

zachytong commented 2 months ago

Hi rainmaker22,

I came across an error as follow

lightning_fabric.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val_cls_acc') could not find the monitored key in the returned metrics: ['train_loss', 'train_loss_step', 'cls_loss', 'cls_loss_step', 'val_loss', 'train_loss_epoch', 'cls_loss_epoch', 'epoch', 'step']. HINT: Did you call log('val_cls_acc', value) in the LightningModule?` .

It seems that this is caused by val_cls_acc not logged with inference_token set to False in smart.py. Setting it to True or tracking another metrics/loss will work fine. Not sure which one is better, maybe you can help: )

rainmaker22 commented 2 months ago

Thank you for bringing up this issue. I've submitted a commit to fix the error. Apologies for mistakenly removing the val_cls_acc log earlier. Due to time constraints, the code might not have been thoroughly tested. If you encounter any further issues, feel free to let me know.