googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.2k stars 724 forks source link

Unable to read file from a large folder (Input/output error) #450

Closed Edouard2laire closed 5 years ago

Edouard2laire commented 5 years ago

Hello. I have a folder in my drive that contains 123k images from Ms-coco dataset. Unfortunately, i am not able to read them as an Input/output error occur.

!ls "Data/"  # annotations  unlabeled2017  unlabeled2017_resized.zip  unlabled_list.txt
!ls "Data/unlabeled2017/" # cannot open directory 'Data/unlabeled2017/': Input/output error
open("Data/unlabeled2017/000000374412.jpg") # cannot access 'Data/unlabeled2017/000000374412.jpg': Input/output error

Image from my google drive that show that the file exist : capture du 2019-03-01 11-38-22

thx for your help

Prasanna1991 commented 5 years ago

I am facing the same situation.

I even tried os.path.exists(filePath) and it would give me False if filePath is in such directory. When I move the file to another directory of a small number of files then it's fine.

Since this is closed, is the issue solved?

Thanks.

colaboratory-team commented 5 years ago

Sadly this is a known shortcoming of the integration between Colab and Drive: https://research.google.com/colaboratory/faq.html#drive-timeout

sawyermade commented 4 years ago

Hey guys, I figured out how to get the whole COCO-2017 dataset into Colab with Google Drive. Basically I broke train2017 and test2017 down into sub directories with a max of 5000 files (I noticed Colab could only read somewhere around 15k files from a directory, so 5000 seemed a safe bet). Here is the code for that: https://github.com/sawyermade/detectron2_pkgs/tree/master/dataset_download

Then I used rclone to upload the whole damn dataset to Google Drive and shred with anyone who has a link can view: https://drive.google.com/drive/folders/1EVsLBRwT2njNWOrmBAhDHvvB8qrd9pXT?usp=sharing

Once you have the share in your google drive, create a shortcut for it so it can be accessed by Colab. Then I just create 118287 for train and 40670 for test symbolic links in the local directory. So far, it is working like a charm. I even save all my output to Google Drive so it can be resumed after the 12 hour kick. Here is the notebook for that: https://colab.research.google.com/drive/1OVStblo4Q3rz49Pe9-CJcUGkkCDLcMqP

I am training a mask rcnn now, will report results when finished but its looking pretty damn good so far.