googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.16k stars 698 forks source link

Reading file from Drive giving OSError #559

Open furkanyildiz opened 5 years ago

furkanyildiz commented 5 years ago

I can not read my files on drive. It's sometimes working but mostly giving OSError.

OSError: Can't read data (file read failed: time = Mon May 20 00:34:07 2019
, filename = '/content/drive/My Drive/train/trainX_file1', file descriptor = 83, errno = 5, error message = 'Input/output error', buf = 0xc71d3864, total read size = 42145, bytes this sub-read = 42145, bytes actually read = 18446744073709551615, offset = 119840768)

Also creating file giving the OSError.

OSError: Unable to create file (unable to open file: name = '/content/drive/My Drive/train/model.hdf5', errno = 5, error message = 'Input/output error', flags = 13, o_flags = 242)

You can check my nootebook to see the error. https://colab.research.google.com/drive/1MHJhYtR1PGyb5HKUY8-hrFPn-SBtOige

Note: https://research.google.com/colaboratory/faq.html#drive-timeout does not helped me.

colaboratory-team commented 5 years ago

Sorry for the trouble. Does this reproduce reliably for you? Can you share details of your Drive file layout? e.g., how many files are in this directory, and how large is the trainX_file1 file?

colaboratory-team commented 5 years ago

b/133228148

furkanyildiz commented 5 years ago

In my training process, I'm using train and validation directories in my drive. Train directory has 280 X file each 600MB and 280 y file each 9KB. Validation directory has 140 X file each 100MB and 140 y file each 6KB. All files format is h5py.

In an epoch I'm reading all of the files once.

When I start training, it gives this error after a while. After waiting 24 hours the problem has improved. Now I've run my code again and the first two epoch is finished smoothly. However In the third epoch it gave the same error.

After I get the OSError for any file, I get this error in other files I've not used before the error.

Edit: Also I try to train my network with reading one BIG (62GB) h5 file. After reading first 21k array, it gives same OSError.

glenn-jocher commented 5 years ago

I have the same problem. When copying a 20GB file from a mounted Google Drive folder:

!cp 'drive/My Drive/cloud/data/coco_colab2.zip' . && unzip -q coco_colab2.zip
cp: error reading 'drive/My Drive/cloud/data/coco_colab2.zip': Input/output error
AristotelisPap commented 5 years ago

I also have the same problem. I am using Google Colab and I am trying to access a .bin file which is approximately ~230GB. I am always getting the same OSError. Is there a way to fix that?

glenn-jocher commented 5 years ago

UPDATE: I've had success skipping the embedded Google Drive client in Colab and running a curl request directly to the file (confirming download for very large files):

fileid="1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO123"
filename="coco_gdrive.zip"
curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
gilgarad commented 5 years ago

Still not fixed?

IAmSuyogJadhav commented 5 years ago

A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.

napsta32 commented 4 years ago

I am getting constantly this error after uploading file with colab:

image

I installed gdrive in colab to see if I can download my file to VM but I can't:

image

Then I tried to download from website and it won't let me:

image

This file is not shared to anyone, only I see it. I tried uploading the file again from scratch [in colab] but download keeps failing. Is this all related?

Also tried to make a copy via website and Drive doesn't let me download the copy.

I found just one workaround to download uploaded file and it was by sharing it to another gmail account. This is useless since gmail gives me the free limit of 15gb of storage.

invincible-akshay commented 4 years ago

Yes, it appears to be related. I had come across this thread where some download quotas are mentioned - https://support.google.com/drive/thread/2035857?hl=en

OlehOnyshchak commented 4 years ago

A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.

That seems to work for me as well. The original problematic file was added to drive via the "Add to my Drive" button. That is, after following a link with a public dataset, I copied it to my drive to work with it on colab. Might be some errors with cloning file to disk

ricardoboss commented 4 years ago

I also have this problem. Copying the file into colab first results in

cp: error reading '/content/drive/My Drive/data.tar.gz': Input/output error

like in this related issue: #510

The file I'm trying to copy is 17GB.

robinhad commented 4 years ago

Seems like Google Drive have limits on downloading files. My current workaround is to upload file to Dropbox and wget it to runtime's local storage. You can wget file by following these simple steps:

  1. Get share link to file on Dropbox, which should look like this: https://www.dropbox.com/s/<fileid>/<filename>?dl=0
  2. Change ?dl=0 to ?dl=1, this gives direct link to file.
  3. Your wget command will look like this: !wget 'https://www.dropbox.com/s/<fileid>/<filename>?dl=1' -O "<filename>".

I tried this on 10GB file.

gaceladri commented 4 years ago

Same error here trying to load a pretrained model from google drive. First I mount the drive disk on colab and then when I try to load the checkpoint, it fails giving me this error. Yesterday was no issue loading it.


File "/content/fairseq/fairseq/checkpoint_utils.py", line 168, in load_checkpoint_to_cpu
    f, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 526, in load
    if _is_zipfile(opened_file):
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 59, in _is_zipfile
    byte = f.read(1)
OSError: [Errno 5] Input/output error
lucalgbm commented 4 years ago

I also have the same problem. I am using Google Colab and I am trying to access a .bin file which is approximately ~230GB. I am always getting the same OSError. Is there a way to fix that?

Since I have the same issue, did you find a solution? Thanks!

lucalgbm commented 4 years ago

In my training process, I'm using train and validation directories in my drive. Train directory has 280 X file each 600MB and 280 y file each 9KB. Validation directory has 140 X file each 100MB and 140 y file each 6KB. All files format is h5py.

In an epoch I'm reading all of the files once.

When I start training, it gives this error after a while. After waiting 24 hours the problem has improved. Now I've run my code again and the first two epoch is finished smoothly. However In the third epoch it gave the same error.

After I get the OSError for any file, I get this error in other files I've not used before the error.

Edit: Also I try to train my network with reading one BIG (62GB) h5 file. After reading first 21k array, it gives same OSError.

Hi, did you find a solution? I have the same issue...

Zappytoes commented 4 years ago

This an unacceptable flaw in Colab and in my view, completely delegitimizes it as a platform for Machine learning. The ability of a computing platform to handle large amounts of data is absolutely essential in this field, and I think it's just plain crooked of Google to tell people they have a product that is designed for deep learning and ML. I paid for a Pro account and have tried every work around and "OSError: [Errno 5] Input/output error" will always show up again eventually and stop you dead in your tracks. This is not just a "bug", this is the reason you should not use Colab if you have other options.

lucalgbm commented 4 years ago

This an unacceptable flaw in Colab and in my view, completely delegitimizes it as a platform for Machine learning. The ability of a computing platform to handle large amounts of data is absolutely essential in this field, and I think it's just plain crooked of Google to tell people they have a product that is designed for deep learning and ML. I paid for a Pro account and have tried every work around and "OSError: [Errno 5] Input/output error" will always show up again eventually and stop you dead in your tracks. This is not just a "bug", this is the reason you should not use Colab if you have other options.

This is very frustrating since at the moment I have no other options, I have to finish the research project, I have no time to find another platform. So you are telling that there is no solutions (even temporary) to fix OSError?? If this is true I'm done! My features file is abput 20GB, and I have 50GB in the Gdrive. I think this is not so huge, for this reason I can't believe I cannot proceed to train my network. Yesterday everithing was ok, today OSerror starts with my HDF5 file of 20GB and the same error occurs even with all other files (even with files of 10MB !!). What can I do?

glenn-jocher commented 4 years ago

The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.

Zappytoes commented 4 years ago

@lucalgbm I feel your pain my friend. I don't have the answers for you but what it sounds like is that it really comes down to transfer limits in google drive. So from what I've gathered, you need to 1) give it some "time" 2) figure out how to do your work in smaller batches of data 3) keep trying other suggestions you find like what @IAmSuyogJadhav posted, or 4) look into GCP or another cloud storage service that scales for big data. Good luck!

Zappytoes commented 4 years ago

@lucalgbm I will also mention that I've had improved performance with the Pro account by making sure both the GPU and High-Ram options are activated in the runtime options: https://stackoverflow.com/questions/54973331/input-output-error-while-using-google-colab-with-google-drive/61388687#61388687

I think similar problems to the OS error can even happen without warning. I have suspicions that data/ files can just get dropped from a process without warning or error, leading to other errors down the workflow (e.g., building a tfrecord from images/ annotations).

glenn-jocher commented 4 years ago

@Zappytoes curious about the pro option. What are the RAM and CPU count improvements? The free-side GPUs are very useful, as occasionally you can get a T4 or P100, but the 2 cpu count severely hobbles the system for me, as 2 dataloader workers is not optimal for my cnn training.

glenn-jocher commented 4 years ago

@Zappytoes @lucalgbm also to be clear, this is not a colab issue, this is a google drive issue.

Zappytoes commented 4 years ago

The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.

@glenn-jocher where did you learn about this workaround? I see on the website they mention unlimited storage, but are we just assuming this also includes increased google drive file transfer quota limits? I already pay for additional storage (200Gb), but I'm not sure I gained any download/ transfer limits. I'm willing to try.

glenn-jocher commented 4 years ago

@Zappytoes saw it somewhere and verified it for myself. I'm using a gsuite google drive folder to host training data for https://github.com/ultralytics/yolov3, where hundreds (thousands?) of people are downloading them without issue.

My paid personal gdrive transfers crash due to the hidden quota. My paid gsuite gdrive transfers always work fine, i.e. 20Gb transfers into colab etc, never had an error after switching the files from personal to gsuite drive.

Zappytoes commented 4 years ago

@Zappytoes curious about the pro option. What are the RAM and CPU count improvements? The free-side GPUs are very useful, as occasionally you can get a T4 or P100, but the 2 cpu count severely hobbles the system for me, as 2 dataloader workers is not optimal for my cnn training.

@glenn-jocher Colab Pro with the GPU accelerator and High-Ram options:

Ram = 25.51 GB Ram

GPU Info

Screen Shot 2020-04-27 at 4 31 56 PM
Zappytoes commented 4 years ago

@colaboratory-team ,if @glenn-jocher 's work around is the way to go, it would be great if there was some official Colab documentation on mounting your drive with Google Cloud G Suite so users could scale their computing environments appropriately using the paid service. Thank you!

glenn-jocher commented 4 years ago

@Zappytoes thanks bud. Do you know what the cpu count is?

import os
os.cpu_count()
lucalgbm commented 4 years ago

A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.

Hi, can you please explain how do that?

lucalgbm commented 4 years ago

@colaboratory-team ,if @glenn-jocher 's work around is the way to go, it would be great if there was some official Colab documentation on mounting your drive with Google Cloud G Suite so users could scale their computing environments appropriately using the paid service. Thank you!

In my case the G suite is free and we have unlimited drive space since for Universities Gsuite and unlimited drive space is provided for free. Hence I don't understand how paying, let's say for 200GB, something can be improved in such a case...

IAmSuyogJadhav commented 4 years ago

A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.

Hi, can you please explain how do that?

@lucalgbm , Go to your Google drive from a web browser, right-click on the file and create a copy of that file. This newly created file will now have a different file ID and refreshed quota limit. Now, try copying this newly created file in your colab notebook environment.

IAmSuyogJadhav commented 4 years ago

For everyone, In case you are able to download the file by opening Google Drive in a web browser but not in the Colab environment: A common workaround that I recently discovered is using a chrome extension called CurlWget. The steps are as follows:

1) Download the CurlWget chrome extension. 2) Go to your Google drive, and click to download the file. As soon as the downloading starts, click on the CurlWget extension icon. It will show you a long command. You may now cancel the download that you started moments ago. 3) Copy the command to your Colab environment to download the file.

Zappytoes commented 4 years ago

@glenn-jocher sorry

Screen Shot 2020-04-28 at 8 56 34 AM
glenn-jocher commented 4 years ago

@Zappytoes ah thanks! That's double the free version then, much more useful.

Zappytoes commented 4 years ago

@glenn-jocher Update on using Google Drive through G Suite for Business as a work-around for OS Errors and Google Drive timeouts. I signed up for a G Suite account and migrated all my data/ code to the new Google Drive hosted on G Suite. During a "os.listdir" request on a folder with lots of files in it, I got "OSError: [Errno 5] Input/output error:" and a window popped up saying a google drive timeout had occurred. So unfortunately, this is not a solution...

Screen Shot 2020-04-30 at 10 46 02 AM
glenn-jocher commented 4 years ago

@Zappytoes ah, I'm sorry, my explanation was incomplete then. My use case is I need many people all around the world to be able to download contents from my google drive folder for our AI tutorials, so a google drive sync is not an option for me.

Instead I've open-accessed my folder on gsuite google drive so "anyone with the link" can download, then I've created a python download function that accepts the file ID. You can see this in practice here: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data#reproduce-our-results

git clone https://github.com/ultralytics/yolov3
python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1h0Id-7GUyuAmyc9Pwo2c3IZ17uExPvOA','coco2017demos.zip')"  # datasets (20 Mb)
Zappytoes commented 4 years ago

The best, but non-free, solution to this issue is to host your data on a cloud bucket, such as a Google Cloud Platform (GCP) bucket. It's free to set up, but charges you as you go. I've been training with ~13Gb of imagery data almost non-stop for 14 days and its cost me about $7 so far.

1) Create a Google Cloud Storage project. Go to the Resource Manager and create a new project. https://console.cloud.google.com/cloud-resource-manager

Screen Shot 2020-05-14 at 10 10 32 AM

2) Enable billing for the project: https://cloud.google.com/billing/docs/how-to/modify-project

3) After the project is created (and you need to have billing enabled, as the storage will cost you a few cents per month) click on the menu in the upper right corner and select Storage (somewhere way down the menu). Next you need to create a bucket for the data (The name of the bucket must be globally unique, not only for your account but for all accounts).

Screen Shot 2020-05-14 at 10 15 25 AM

4) Once your bucket is set up (and you've uploaded your data to the bucket), you can connect Colab to GCS using Google Auth API and gcfuse. Run the following commands in Colab:

## Authenticate ##

from google.colab import auth auth.authenticate_user()

## Use this to install gcsfuse on colab. Cloud Storage FUSE is an open source FUSE adapter that allows you to mount Cloud Storage buckets as file systems on Colab, Linux or macOS systems. ####

!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list !curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - !apt -qq update !apt -qq install gcsfuse

## Make a directory name for your bucket in Colab and mount the bucket at that directory in Colab ##

!mkdir name_of_bucket_on_Colab !gcsfuse --implicit-dirs name_of_bucket_on_GCP name_of_bucket_on_Colab

5) You can continue to also have other storage mounted, such as your Google Drive from google.colab import drive drive.mount('/content/drive')

Further reading:

https://medium.com/@philipplies/transferring-data-from-google-drive-to-google-cloud-storage-using-google-colab-96e088a8c041

https://gist.github.com/korakot/f3600576720206363c734eca5f302e38

https://cloud.google.com/storage/docs/gcs-fuse

https://stackoverflow.com/questions/51715268/how-to-import-data-from-google-cloud-storage-to-google-colab

https://stackoverflow.com/questions/61600439/how-to-mount-gcp-bucket-in-google-colab/61615097#61615097

Santosh-Gupta commented 4 years ago

The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.

I have a paid google drive account and I still have this issue.

glenn-jocher commented 4 years ago

@Santosh-Gupta great, good for you. I said a business plan.

glenn-jocher commented 4 years ago

@Santosh-Gupta you have to sign up for one of the gsuite plans. It's about $6 a month, you get 30gb drive included. The personal paid plans have the same problem as the personal free plans. https://gsuite.google.com/

Santosh-Gupta commented 4 years ago

@glenn-jocher Update on using Google Drive through G Suite for Business as a work-around for OS Errors and Google Drive timeouts. I signed up for a G Suite account and migrated all my data/ code to the new Google Drive hosted on G Suite. During a "os.listdir" request on a folder with lots of files in it, I got "OSError: [Errno 5] Input/output error:" and a window popped up saying a google drive timeout had occurred. So unfortunately, this is not a solution...

Screen Shot 2020-04-30 at 10 46 02 AM

Could you just download the data from your g-suite into your colab notebook each time?

glenn-jocher commented 4 years ago

@Santosh-Gupta I would just download the data directly.

Santosh-Gupta commented 4 years ago

I just tried from a friend's g-suite account. I was able to download a 27 gig file twice, but after that it stopped letting me download it.

Santosh-Gupta commented 4 years ago

I tried to contact Google One support to see if they could reset the transfer restrictions so I could at least use colab drive mount as normal, but they said they don't have any restrictions. They said this would be a colab error, not a google drive error.

This is confusing to me because the Google drive integration with colab stopped working only after I reached a transfer limit, and I got an error message about some sort of limit after downloading the large file, but they were as perplexed as I was.

My google drive mounting seems to be mostly back to normal, but I still can't read my my numpy memmap too many times before my colab instance crashes.

Here's what my runtime logs look like

May 17, 2020, 12:51:18 PM   WARNING WARNING:root:kernel 2a456c0f-9129-4e09-8933-95f3c24831b9 restarted
May 17, 2020, 12:51:18 PM   INFO    KernelRestarter: restarting kernel (1/5), keep random ports
May 17, 2020, 12:48:03 PM   WARNING tcmalloc: large alloc 1342185472 bytes == 0xff5a6000 @ 0x7f93593971e7 0x5ab685 0x569c94 0x56e153 0x7f93572f7382 0x7f93572f9e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:48:01 PM   WARNING tcmalloc: large alloc 1342185472 bytes == 0xaf5a4000 @ 0x7f93593971e7 0x5ab685 0x569c94 0x56e153 0x7f93572f6ed6 0x7f93572f9e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:47:32 PM   WARNING 2020-05-17 19:47:32.285288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
May 17, 2020, 12:47:06 PM   WARNING WARNING:root:kernel 2a456c0f-9129-4e09-8933-95f3c24831b9 restarted
May 17, 2020, 12:47:06 PM   INFO    KernelRestarter: restarting kernel (1/5), keep random ports
May 17, 2020, 12:44:49 PM   WARNING tcmalloc: large alloc 1342185472 bytes == 0x10a34c000 @ 0x7f34292261e7 0x5ab685 0x569c94 0x56e153 0x7f3427186382 0x7f3427188e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:44:46 PM   WARNING tcmalloc: large alloc 1342185472 bytes == 0xba34a000 @ 0x7f34292261e7 0x5ab685 0x569c94 0x56e153 0x7f3427185ed6 0x7f3427188e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:44:11 PM   WARNING 2020-05-17 19:44:11.812583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

My notebook was working just fine 2 days ago. Reading from the memmap also works just fine when it's downloaded into local.

Here is my notebook if anyone wants to take a look

https://colab.research.google.com/drive/1XZDLgJ4PDquZ1kj4eRTHQ0F_Q6aVp_V3?usp=sharing

Santosh-Gupta commented 4 years ago

And again I am getting I/O errors, even trying to read in a small 13 mb file, even though it's been a day and a half since I ran into the quota limit.

I am starting to worry that my colab won't ever go back to normal.

Zappytoes commented 4 years ago

@Santosh-Gupta for smaller files, you might try just uploading the data directly to the Colab drive each day (e.g., right into the '/content/' directory). But this will be deleted each day when your session terminates.

Other than that, try the GCP bucket solution i posted...

ngoanpv commented 4 years ago

@colaboratory-team

Sorry for the trouble. Does this reproduce reliably for you? Can you share details of your Drive file layout? e.g., how many files are in this directory, and how large is the trainX_file1 file?

So how many files or sub-directory in the directory is good? I had the same error with the directory which contains >10000 sub-directory, I think that might be the problem.

dimitry-ishenko commented 4 years ago

Another work-around using wget:

FILEID="<your-gdrive-file-id>"
FILENAME="/path/to/saved/file"
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id={FILEID}' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id={FILEID}" -O {FILENAME} && rm -rf /tmp/cookies.txt

Found it here.

imenebak commented 4 years ago

I had the same problem, I couldn't unzip or even copy data from my drive to colabs environment this two days. And I managed to fix it by canceling the sharing of the file with other users. I think it's due to the overuse of the same resource by several users

nickagee commented 4 years ago

When I run the following command: !cp "/content/drive/My Drive/DL_Class/face-generation/data/processed-celeba-small.zip" /content/

I just received the below error: cp: error reading '/content/drive/My Drive/DL_Class/face-generation/problem_unittests.py': Input/output error

The file is a 2KB Python file.

edited formatting.