Closed Daniel-Jiang358 closed 2 years ago
It has been stuck in this situation for very long time
Some other codes can be run normally, and autoformer is not even started
Are you using your own datasets? It will be very helpful for us to check the code if you provide a subset of your dataset.
I have found the reasons. But would you please explain the reason that training on one A40-Link is much faster than multiple A40-Link? I tried other ones. Like A5000,3090,the same results. When I test on the ECL dataset on a5000,when using one card,each iter cost 0.04s,while on 8 cards,about 3s. I consider you abnormal.
---Original--- From: @.> Date: Tue, Sep 6, 2022 16:43 PM To: @.>; Cc: "Daniel @.**@.>; Subject: Re: [thuml/Autoformer] The training process not even started (Issue#81)
Are you using your own datasets? It will be very helpful if you provide a subset of your dataset.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I think time series forecasting is a light task. Maybe the multi-gpu training will cause extra communication costs.
Hello, I am running this code on my severer, however, the training process won't started while there are still normal gpu occupation.