median-research-group / LibMTL

A PyTorch Library for Multi-Task Learning
MIT License
1.94k stars 181 forks source link

When running the example code for QM9, the program seems to enter an infinite loop. QM9案例训练代码无响应 #54

Closed Qianqian-Yang closed 1 year ago

Qianqian-Yang commented 1 year ago

When running train_qm9.py file, the program prints the following information and then becomes unresponsive, as if it has entered an infinite loop. What could be the possible reasons for this issue?

在运行train_qm9.py的时候,程序打印以下信息后就再无动静,似乎进入了死循环?请问可能是什么原因造成的呢?

打印信息如下: General Configuration: Wighting: EW Architecture: HPS Rep_Grad: False Multi_Input: False Seed: 0 Save Path: None Load Path: None Device: cuda:0 Optimizer Configuration: optim: adam lr: 0.0001 weight_decay: 1e-05 Total Params: 617675 Trainable Params: 617675 Non-trainable Params: 0 LOG FORMAT | 0_LOSS MAE | 1_LOSS MAE | 2_LOSS MAE | 3_LOSS MAE | 5_LOSS MAE | 6_LOSS MAE | 12_LOSS MAE | 13_LOSS MAE | 14_LOSS MAE | 15_LOSS MAE | 11_LOSS MAE | TIME

Baijiong-Lin commented 1 year ago

我这边是可以正常运行的 image

Baijiong-Lin commented 1 year ago

Closed as no further updates.

Qianqian-Yang commented 1 year ago

刚刚发现,DataLoader里面把num_workers设为0,不使用多进程就可以跑了,使用多进程就跑不出来

Baijiong-Lin commented 1 year ago

这样跑的很慢的

Qianqian-Yang commented 1 year ago

开多进程就跑不出来,像死循环了,好像是Dataloader有问题,调试的时候会报错TypeError: cannot pickle 'generator' object

Baijiong-Lin commented 1 year ago

我这边是可以正常运行的 image

我用多进程是可以跑的

Qianqian-Yang commented 1 year ago

你是linux系统吗?我是windows系统,多进程(multiprocessing)在windows系统里面好像是容易出问题

Baijiong-Lin commented 1 year ago

是的,linux系统