googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.2k stars 721 forks source link

Issue with Google Colab Pro+: Frequent Disconnection with Google Drive mount during Code Execution #3785

Open JonFillip opened 1 year ago

JonFillip commented 1 year ago

Describe the current behavior I am frequently encountering an error with Google Colab Pro+ during the execution of my code. Specifically, Google Drive appears to be randomly disconnecting, causing a disruption in my operations. This issue happens across multiple training folds when running my machine learning model. The error indicates an issue with the inspect module in Python, but I believe the underlying issue is the disconnection of Google Drive. Here's the error message:

ERROR:root:Internal Python error in the inspect module. Below is the traceback from this internal error. ... OSError: [Errno 107] Transport endpoint is not connected

Describe the expected behavior

I expected a seamless and uninterrupted integration between Google Colab Pro+ and Google Drive. The code should be able to execute without unexpected disconnections, especially during long-running tasks such as model training in machine learning projects, which upon paying for Colab Pro+ was what I thought I would be getting.

What web browser you are using I've been using Safari to access Google Colab Pro+.

Additional context Link to a minimal, public, self-contained notebook that reproduces this issue. Here is a link to a minimal, public, self-contained notebook that reproduces this issue. The error can be seen in the Train Model section in the notebook, I have titled the specific cell See Error Here. This issue has been causing substantial delays in my work and leading to unnecessary expenses, as I am paying for the Pro+ subscription. I kindly request a prompt resolution to this problem. Thank you.

Hugomer commented 1 year ago

Hey JonFillip, I experience the same issue. Could you fix the problem or do you know a work around. Thanks in advance. Best regards, Hugo

cperry-goog commented 1 year ago

Are you hitting https://research.google.com/colaboratory/faq.html#drive-timeout ? I assume you have many thousands of files in Drive?

olaviinha commented 1 year ago

I am also facing this issue. As with others, Drive disconnection occurs after approximately 4 hours of active Colab notebook session. This has happened now for weeks. It happens also when accessing only Drive folders that have only 5 files.

cperry-goog commented 1 year ago

Consolidating other related bugs to this one - we've been struggling to reproduce. If anyone has a minimal reproducible notebook that triggers this error and disconnection that would be a big help.

josem7 commented 1 year ago

To me, it occurs every time I disconnect from the runtime but leave it running, then when my model reaches a checkpoint I get this error

sainisatish commented 1 year ago

I have also got the same issue while training yolov5 , in the middle of the training file not found error occur.