googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.18k stars 708 forks source link

Disconnection and Speed Issues with Colab Pro #4430

Closed amohim closed 6 months ago

amohim commented 6 months ago

I am a subscriber to Colab Pro and here are two issues I've been encountering:

Disconnection:

I'm using a Tesla T4 runtime on Colab Pro. Recently, my runtime has been disconnecting after approximately 2 hours each day. Previously, with the free version of Colab, I was able to run for longer periods, exceeding 3 hours. In fact, a few months ago, I could even achieve close to 12 hours of daily connection on Colab Pro.

Speed:

I'm currently training a model that takes around 17 minutes per epoch. Interestingly, two months ago, a similar model (with slightly fewer parameters) trained in only 6 minutes per epoch using the exact same settings. I would be grateful if anyone could help me in these issues. The frequent disconnections disrupt my workflow, and the significant slowdown in training speed is delaying my work results.

Imm0ney commented 6 months ago

im having the same problem, im colab pro user and it keeps saying "cannot connect to GPU backend" and its been more than 24 hours but keeps saying the same thing.....

EvanWiederspan commented 6 months ago

For the speed issue, can you provide a repro notebook that shows the code you're running?

colaboratory-team commented 6 months ago

@amohim : For the disconnection issue, once a subscriber's purchased compute units are exhausted, their ongoing usage will be treated & restricted the same way as free of charge users, which is as stated in the Runtime - View Resources pane ("You currently have zero compute units available. Resources offered free of charge are not guaranteed. Purchase more units here."). As stated there, resources offered free of charge are not guaranteed and runtimes may be terminated/unavailable as you experienced, with no guarantees of how long such a free of charge T4 runtime will be available.

@Imm0ney : If you can't get a T4 GPU backend even after 24 hours of non-usage, please file in-product feedback by clicking Help > Send feedback in order for us to investigate for your specific usage history.

amohim commented 6 months ago

Hi Colab Team,

Thank you for your reply and for looking into my concerns.

For the speed issue, I'm training YOLOv7 object detection model with the WIDERFACE dataset. With the "High RAM" setting on Colab Pro, each epoch used to take only 5-6 minutes to train each epoch. However, now it's taking approximately 17 minutes per epoch with the exact same training configuration. without using the "High RAM" setting training each epoch takes around 20 minutes. so now for the speed, there is not much different between the free and paid Colab-Pro version.

Regarding the disconnection issue, I understand the limitations of free tier resources. However, I've been a Colab Pro subscriber for around one year but I canceled the subscription one month ago and I subscribed again this month. Previously, my runtime on Colab Pro would consistently last for 6-12 hours daily.

This month, however, the runtime duration has been steadily decreasing. It started at around 3 hours and has now dropped to less than two hours daily. This frequent disconnection significantly disrupts my workflow.

Could you please clarify if there have been any recent changes to Colab Pro policies regarding runtime duration?

Thank you for your time and assistance.

colaboratory-team commented 6 months ago

Could you please clarify if there have been any recent changes to Colab Pro policies regarding runtime duration?

"Colab Pro policies regarding runtime duration" apply only when the subscriber's (Pro or Pro+) paid compute unit balance is positive. When the subscriber's paid compute unit balance runs out, their runtimes are offered as free of charge usages and limited the same way as free of charge users. As stated in the Runtime - View Resources pane, "resources offered free of charge are not guaranteed," implying such observed changes in runtime duration are possible anytime once your paid compute unit balance becomes zero. The runtime durations & disconnects you are describing imply that your paid compute balance (as a Pro subscriber, receiving 100 compute units every month) ran out. If your paid compute balance hasn't run out at the time of such disconnects, please submit feedback (Help - Send feedback) so that we can take a look at your specific usage history. If you need guaranteed runtime durations beyond free of charge allowance after your paid compute unit balance is exhausted, you may purchase more Pay-As-You-Go compute units, by choosing either 100 or 500 Compute Units on the leftmost box in the Colab Signup page.

CL-VO commented 6 months ago

THe speed issue has been reported thousands of times but they can't fix it. Too much idiots in their team.

Don't buy credits, it's a scam.

And they are proud to have the same business model as the worst crack dealers, LOL.

amohim commented 6 months ago

Thanks Colab Team for your support There was a bug in the code caused the speed issue. I will close the issue now