Closed liuyueChang closed 1 year ago
ohhh, there is a very important phenomenon during my training. the gpu util rate increases to 20% and then decrease, then the gpu util rate will still at 0 for a span During my training, such phenomenon is recurring
Hey :)
based on your description, my guess is that you use online preprocessing. Is this the case?
Julian
thank you for you answer! I have run the preprocessing script according to your readme, and the preprocessing result has save in pkl file I am very confusing about this phenomenon
Are you also using the preprocessed file during training?
--use_preprocessed=True
yes
I use the --use_preprocessed=True
option
And I debug the train.py, it steps into the
if args.use_preprocessed: with open(input_preprocessed, 'rb') as f: self.data = pickle.load(f)
I have solved this problem.
The num_worker
should set 0!
Because the parameter num_worker
is high, in my code, this parameter is set to 4. It makes the training very slow
Although i still don't figure out the reason, i can continue my own work!
Awesome! Thanks for letting me know :D
Thank you for your paper and code! I am training the model on my machine, the gpu is A6000. When i set the batch_size 128 or 256, there is no improvement in training speed. totally It will spend about 1 day to finish the 72 epochs Can you give me some advise or possible method to solve this problem?
Here is my environment If you want me give more infomation, please tell me! Thank you very much!
_pytorch-lightning 1.5.10 pytz 2022.1 PyYAML 6.0 requests 2.27.1 requests-oauthlib 1.3.1 rsa 4.8 scikit-learn 1.0.2 scipy 1.8.0 setuptools 59.5.0 shapely 2.0.1 six 1.16.0 sklearn 0.0.post1 tensorboard 2.8.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 termcolor 2.2.0 threadpoolctl 3.1.0 torch 1.11.0 torch-geometric 2.0.4 torch-scatter 2.0.9 torch-sparse 0.6.13 torchaudio 0.11.0 torchcde 0.2.5 torchdiffeq 0.2.3 torchmetrics 0.8.0 torchsde 0.2.5 torchvision 0.12.0 tqdm 4.64.0 trampoline 0.1.2 typingextensions 4.1.1 urllib3 1.26.8 Werkzeug 2.1.1 wheel 0.37.1 yarl 1.7.2 zipp 3.8.0