Open furkanyildiz opened 5 years ago
Sorry for the trouble. Does this reproduce reliably for you? Can you share details of your Drive file layout? e.g., how many files are in this directory, and how large is the trainX_file1 file?
b/133228148
In my training process, I'm using train and validation directories in my drive. Train directory has 280 X file each 600MB and 280 y file each 9KB. Validation directory has 140 X file each 100MB and 140 y file each 6KB. All files format is h5py.
In an epoch I'm reading all of the files once.
When I start training, it gives this error after a while. After waiting 24 hours the problem has improved. Now I've run my code again and the first two epoch is finished smoothly. However In the third epoch it gave the same error.
After I get the OSError for any file, I get this error in other files I've not used before the error.
Edit: Also I try to train my network with reading one BIG (62GB) h5 file. After reading first 21k array, it gives same OSError.
I have the same problem. When copying a 20GB file from a mounted Google Drive folder:
!cp 'drive/My Drive/cloud/data/coco_colab2.zip' . && unzip -q coco_colab2.zip
cp: error reading 'drive/My Drive/cloud/data/coco_colab2.zip': Input/output error
I also have the same problem. I am using Google Colab and I am trying to access a .bin file which is approximately ~230GB. I am always getting the same OSError. Is there a way to fix that?
UPDATE: I've had success skipping the embedded Google Drive client in Colab and running a curl request directly to the file (confirming download for very large files):
fileid="1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO123"
filename="coco_gdrive.zip"
curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
Still not fixed?
A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.
I am getting constantly this error after uploading file with colab:
I installed gdrive in colab to see if I can download my file to VM but I can't:
Then I tried to download from website and it won't let me:
This file is not shared to anyone, only I see it. I tried uploading the file again from scratch [in colab] but download keeps failing. Is this all related?
Also tried to make a copy via website and Drive doesn't let me download the copy.
I found just one workaround to download uploaded file and it was by sharing it to another gmail account. This is useless since gmail gives me the free limit of 15gb of storage.
Yes, it appears to be related. I had come across this thread where some download quotas are mentioned - https://support.google.com/drive/thread/2035857?hl=en
A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.
That seems to work for me as well. The original problematic file was added to drive via the "Add to my Drive" button. That is, after following a link with a public dataset, I copied it to my drive to work with it on colab. Might be some errors with cloning file to disk
I also have this problem. Copying the file into colab first results in
cp: error reading '/content/drive/My Drive/data.tar.gz': Input/output error
like in this related issue: #510
The file I'm trying to copy is 17GB.
Seems like Google Drive have limits on downloading files. My current workaround is to upload file to Dropbox and wget it to runtime's local storage. You can wget file by following these simple steps:
https://www.dropbox.com/s/<fileid>/<filename>?dl=0
?dl=0
to ?dl=1
, this gives direct link to file.!wget 'https://www.dropbox.com/s/<fileid>/<filename>?dl=1' -O "<filename>"
.I tried this on 10GB file.
Same error here trying to load a pretrained model from google drive. First I mount the drive disk on colab and then when I try to load the checkpoint, it fails giving me this error. Yesterday was no issue loading it.
File "/content/fairseq/fairseq/checkpoint_utils.py", line 168, in load_checkpoint_to_cpu
f, map_location=lambda s, l: default_restore_location(s, "cpu")
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 526, in load
if _is_zipfile(opened_file):
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 59, in _is_zipfile
byte = f.read(1)
OSError: [Errno 5] Input/output error
I also have the same problem. I am using Google Colab and I am trying to access a .bin file which is approximately ~230GB. I am always getting the same OSError. Is there a way to fix that?
Since I have the same issue, did you find a solution? Thanks!
In my training process, I'm using train and validation directories in my drive. Train directory has 280 X file each 600MB and 280 y file each 9KB. Validation directory has 140 X file each 100MB and 140 y file each 6KB. All files format is h5py.
In an epoch I'm reading all of the files once.
When I start training, it gives this error after a while. After waiting 24 hours the problem has improved. Now I've run my code again and the first two epoch is finished smoothly. However In the third epoch it gave the same error.
After I get the OSError for any file, I get this error in other files I've not used before the error.
Edit: Also I try to train my network with reading one BIG (62GB) h5 file. After reading first 21k array, it gives same OSError.
Hi, did you find a solution? I have the same issue...
This an unacceptable flaw in Colab and in my view, completely delegitimizes it as a platform for Machine learning. The ability of a computing platform to handle large amounts of data is absolutely essential in this field, and I think it's just plain crooked of Google to tell people they have a product that is designed for deep learning and ML. I paid for a Pro account and have tried every work around and "OSError: [Errno 5] Input/output error" will always show up again eventually and stop you dead in your tracks. This is not just a "bug", this is the reason you should not use Colab if you have other options.
This an unacceptable flaw in Colab and in my view, completely delegitimizes it as a platform for Machine learning. The ability of a computing platform to handle large amounts of data is absolutely essential in this field, and I think it's just plain crooked of Google to tell people they have a product that is designed for deep learning and ML. I paid for a Pro account and have tried every work around and "OSError: [Errno 5] Input/output error" will always show up again eventually and stop you dead in your tracks. This is not just a "bug", this is the reason you should not use Colab if you have other options.
This is very frustrating since at the moment I have no other options, I have to finish the research project, I have no time to find another platform. So you are telling that there is no solutions (even temporary) to fix OSError?? If this is true I'm done! My features file is abput 20GB, and I have 50GB in the Gdrive. I think this is not so huge, for this reason I can't believe I cannot proceed to train my network. Yesterday everithing was ok, today OSerror starts with my HDF5 file of 20GB and the same error occurs even with all other files (even with files of 10MB !!). What can I do?
The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.
@lucalgbm I feel your pain my friend. I don't have the answers for you but what it sounds like is that it really comes down to transfer limits in google drive. So from what I've gathered, you need to 1) give it some "time" 2) figure out how to do your work in smaller batches of data 3) keep trying other suggestions you find like what @IAmSuyogJadhav posted, or 4) look into GCP or another cloud storage service that scales for big data. Good luck!
@lucalgbm I will also mention that I've had improved performance with the Pro account by making sure both the GPU and High-Ram options are activated in the runtime options: https://stackoverflow.com/questions/54973331/input-output-error-while-using-google-colab-with-google-drive/61388687#61388687
I think similar problems to the OS error can even happen without warning. I have suspicions that data/ files can just get dropped from a process without warning or error, leading to other errors down the workflow (e.g., building a tfrecord from images/ annotations).
@Zappytoes curious about the pro option. What are the RAM and CPU count improvements? The free-side GPUs are very useful, as occasionally you can get a T4 or P100, but the 2 cpu count severely hobbles the system for me, as 2 dataloader workers is not optimal for my cnn training.
@Zappytoes @lucalgbm also to be clear, this is not a colab issue, this is a google drive issue.
The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.
@glenn-jocher where did you learn about this workaround? I see on the website they mention unlimited storage, but are we just assuming this also includes increased google drive file transfer quota limits? I already pay for additional storage (200Gb), but I'm not sure I gained any download/ transfer limits. I'm willing to try.
@Zappytoes saw it somewhere and verified it for myself. I'm using a gsuite google drive folder to host training data for https://github.com/ultralytics/yolov3, where hundreds (thousands?) of people are downloading them without issue.
My paid personal gdrive transfers crash due to the hidden quota. My paid gsuite gdrive transfers always work fine, i.e. 20Gb transfers into colab etc, never had an error after switching the files from personal to gsuite drive.
@Zappytoes curious about the pro option. What are the RAM and CPU count improvements? The free-side GPUs are very useful, as occasionally you can get a T4 or P100, but the 2 cpu count severely hobbles the system for me, as 2 dataloader workers is not optimal for my cnn training.
@glenn-jocher Colab Pro with the GPU accelerator and High-Ram options:
Ram = 25.51 GB Ram
GPU Info
@colaboratory-team ,if @glenn-jocher 's work around is the way to go, it would be great if there was some official Colab documentation on mounting your drive with Google Cloud G Suite so users could scale their computing environments appropriately using the paid service. Thank you!
@Zappytoes thanks bud. Do you know what the cpu count is?
import os
os.cpu_count()
A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.
Hi, can you please explain how do that?
@colaboratory-team ,if @glenn-jocher 's work around is the way to go, it would be great if there was some official Colab documentation on mounting your drive with Google Cloud G Suite so users could scale their computing environments appropriately using the paid service. Thank you!
In my case the G suite is free and we have unlimited drive space since for Universities Gsuite and unlimited drive space is provided for free. Hence I don't understand how paying, let's say for 200GB, something can be improved in such a case...
A workaround that sometimes works for me is making a copy of that file in Drive and then copying that file into Colab.
Hi, can you please explain how do that?
@lucalgbm , Go to your Google drive from a web browser, right-click on the file and create a copy of that file. This newly created file will now have a different file ID and refreshed quota limit. Now, try copying this newly created file in your colab notebook environment.
For everyone, In case you are able to download the file by opening Google Drive in a web browser but not in the Colab environment: A common workaround that I recently discovered is using a chrome extension called CurlWget. The steps are as follows:
1) Download the CurlWget chrome extension. 2) Go to your Google drive, and click to download the file. As soon as the downloading starts, click on the CurlWget extension icon. It will show you a long command. You may now cancel the download that you started moments ago. 3) Copy the command to your Colab environment to download the file.
@glenn-jocher sorry
@Zappytoes ah thanks! That's double the free version then, much more useful.
@glenn-jocher Update on using Google Drive through G Suite for Business as a work-around for OS Errors and Google Drive timeouts. I signed up for a G Suite account and migrated all my data/ code to the new Google Drive hosted on G Suite. During a "os.listdir" request on a folder with lots of files in it, I got "OSError: [Errno 5] Input/output error:" and a window popped up saying a google drive timeout had occurred. So unfortunately, this is not a solution...
@Zappytoes ah, I'm sorry, my explanation was incomplete then. My use case is I need many people all around the world to be able to download contents from my google drive folder for our AI tutorials, so a google drive sync is not an option for me.
Instead I've open-accessed my folder on gsuite google drive so "anyone with the link" can download, then I've created a python download function that accepts the file ID. You can see this in practice here: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data#reproduce-our-results
git clone https://github.com/ultralytics/yolov3
python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1h0Id-7GUyuAmyc9Pwo2c3IZ17uExPvOA','coco2017demos.zip')" # datasets (20 Mb)
The best, but non-free, solution to this issue is to host your data on a cloud bucket, such as a Google Cloud Platform (GCP) bucket. It's free to set up, but charges you as you go. I've been training with ~13Gb of imagery data almost non-stop for 14 days and its cost me about $7 so far.
1) Create a Google Cloud Storage project. Go to the Resource Manager and create a new project. https://console.cloud.google.com/cloud-resource-manager
2) Enable billing for the project: https://cloud.google.com/billing/docs/how-to/modify-project
3) After the project is created (and you need to have billing enabled, as the storage will cost you a few cents per month) click on the menu in the upper right corner and select Storage (somewhere way down the menu). Next you need to create a bucket for the data (The name of the bucket must be globally unique, not only for your account but for all accounts).
4) Once your bucket is set up (and you've uploaded your data to the bucket), you can connect Colab to GCS using Google Auth API and gcfuse. Run the following commands in Colab:
## Authenticate ##
from google.colab import auth auth.authenticate_user()
## Use this to install gcsfuse on colab. Cloud Storage FUSE is an open source FUSE adapter that allows you to mount Cloud Storage buckets as file systems on Colab, Linux or macOS systems. ####
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list !curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - !apt -qq update !apt -qq install gcsfuse
## Make a directory name for your bucket in Colab and mount the bucket at that directory in Colab ##
!mkdir name_of_bucket_on_Colab !gcsfuse --implicit-dirs name_of_bucket_on_GCP name_of_bucket_on_Colab
5) You can continue to also have other storage mounted, such as your Google Drive from google.colab import drive drive.mount('/content/drive')
Further reading:
https://gist.github.com/korakot/f3600576720206363c734eca5f302e38
The only workaround I know of is to host the data on a paid google drive business plan with their gsuite services. It's like the mafia shaking you down at every turn, pay to play etc.
I have a paid google drive account and I still have this issue.
@Santosh-Gupta great, good for you. I said a business plan.
@Santosh-Gupta you have to sign up for one of the gsuite plans. It's about $6 a month, you get 30gb drive included. The personal paid plans have the same problem as the personal free plans. https://gsuite.google.com/
@glenn-jocher Update on using Google Drive through G Suite for Business as a work-around for OS Errors and Google Drive timeouts. I signed up for a G Suite account and migrated all my data/ code to the new Google Drive hosted on G Suite. During a "os.listdir" request on a folder with lots of files in it, I got "OSError: [Errno 5] Input/output error:" and a window popped up saying a google drive timeout had occurred. So unfortunately, this is not a solution...
Could you just download the data from your g-suite into your colab notebook each time?
@Santosh-Gupta I would just download the data directly.
I just tried from a friend's g-suite account. I was able to download a 27 gig file twice, but after that it stopped letting me download it.
I tried to contact Google One support to see if they could reset the transfer restrictions so I could at least use colab drive mount as normal, but they said they don't have any restrictions. They said this would be a colab error, not a google drive error.
This is confusing to me because the Google drive integration with colab stopped working only after I reached a transfer limit, and I got an error message about some sort of limit after downloading the large file, but they were as perplexed as I was.
My google drive mounting seems to be mostly back to normal, but I still can't read my my numpy memmap too many times before my colab instance crashes.
Here's what my runtime logs look like
May 17, 2020, 12:51:18 PM WARNING WARNING:root:kernel 2a456c0f-9129-4e09-8933-95f3c24831b9 restarted
May 17, 2020, 12:51:18 PM INFO KernelRestarter: restarting kernel (1/5), keep random ports
May 17, 2020, 12:48:03 PM WARNING tcmalloc: large alloc 1342185472 bytes == 0xff5a6000 @ 0x7f93593971e7 0x5ab685 0x569c94 0x56e153 0x7f93572f7382 0x7f93572f9e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:48:01 PM WARNING tcmalloc: large alloc 1342185472 bytes == 0xaf5a4000 @ 0x7f93593971e7 0x5ab685 0x569c94 0x56e153 0x7f93572f6ed6 0x7f93572f9e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:47:32 PM WARNING 2020-05-17 19:47:32.285288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
May 17, 2020, 12:47:06 PM WARNING WARNING:root:kernel 2a456c0f-9129-4e09-8933-95f3c24831b9 restarted
May 17, 2020, 12:47:06 PM INFO KernelRestarter: restarting kernel (1/5), keep random ports
May 17, 2020, 12:44:49 PM WARNING tcmalloc: large alloc 1342185472 bytes == 0x10a34c000 @ 0x7f34292261e7 0x5ab685 0x569c94 0x56e153 0x7f3427186382 0x7f3427188e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:44:46 PM WARNING tcmalloc: large alloc 1342185472 bytes == 0xba34a000 @ 0x7f34292261e7 0x5ab685 0x569c94 0x56e153 0x7f3427185ed6 0x7f3427188e7e 0x5a9cbc 0x50a5c3 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50cd96 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x588e5c 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x509a90
May 17, 2020, 12:44:11 PM WARNING 2020-05-17 19:44:11.812583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
My notebook was working just fine 2 days ago. Reading from the memmap also works just fine when it's downloaded into local.
Here is my notebook if anyone wants to take a look
https://colab.research.google.com/drive/1XZDLgJ4PDquZ1kj4eRTHQ0F_Q6aVp_V3?usp=sharing
And again I am getting I/O errors, even trying to read in a small 13 mb file, even though it's been a day and a half since I ran into the quota limit.
I am starting to worry that my colab won't ever go back to normal.
@Santosh-Gupta for smaller files, you might try just uploading the data directly to the Colab drive each day (e.g., right into the '/content/' directory). But this will be deleted each day when your session terminates.
Other than that, try the GCP bucket solution i posted...
@colaboratory-team
Sorry for the trouble. Does this reproduce reliably for you? Can you share details of your Drive file layout? e.g., how many files are in this directory, and how large is the trainX_file1 file?
So how many files or sub-directory in the directory is good? I had the same error with the directory which contains >10000 sub-directory, I think that might be the problem.
Another work-around using wget:
FILEID="<your-gdrive-file-id>"
FILENAME="/path/to/saved/file"
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id={FILEID}' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id={FILEID}" -O {FILENAME} && rm -rf /tmp/cookies.txt
Found it here.
I had the same problem, I couldn't unzip or even copy data from my drive to colabs environment this two days. And I managed to fix it by canceling the sharing of the file with other users. I think it's due to the overuse of the same resource by several users
When I run the following command:
!cp "/content/drive/My Drive/DL_Class/face-generation/data/processed-celeba-small.zip" /content/
I just received the below error:
cp: error reading '/content/drive/My Drive/DL_Class/face-generation/problem_unittests.py': Input/output error
The file is a 2KB Python file.
edited formatting.
I can not read my files on drive. It's sometimes working but mostly giving OSError.
Also creating file giving the OSError.
OSError: Unable to create file (unable to open file: name = '/content/drive/My Drive/train/model.hdf5', errno = 5, error message = 'Input/output error', flags = 13, o_flags = 242)
You can check my nootebook to see the error. https://colab.research.google.com/drive/1MHJhYtR1PGyb5HKUY8-hrFPn-SBtOige
Note: https://research.google.com/colaboratory/faq.html#drive-timeout does not helped me.