Closed salmannauman6 closed 5 years ago
No, it does not. I have just one folder in my root folder which contains this one CSV file I am reading.
Thanks for confirming. Can you share a minimal self-contained repro notebook, either publicly or just with colaboratory-team@google.com ? (it would be helpful to see precisely how you're reading the data)
Does the problem go away if you first
!cp path/to/data.csv local.csv
and then read from the local path?
Similar issue here.
Get
gzip: stdin: Input/output error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
when doing,
!tar -zxvf /content/gdrive/My\ Drive/data.tgz -C ./ > /dev/null
with a large data.tgz file ~ 10GB.
I've no issue accessing 20 GB files.
What causes this issue for me is when there are many files in the folder (or parent folders) I'm accessing. Instead of having path/to/data/data_x_of_1000files_in_folder.csv, I transformed the file structure to path/to/data/20folders/data_x_of_50files_in_folder.csv
Try making sure that there are no more than 50 files in the folder the file is in, or in any of the parent folders.
When I was only accessing a single file, or accessing files sequentially, I could also just try to load the file again, that worked because the context has been loaded already. This didn't work for random access.
Works for me, hope this helps you too.
Similarly things were working without a problem until today, now the untar won't finish anymore with a large file:
tar: /content/gdrive/My Drive/bigfile.tar: Cannot read: Operation not permitted
tar: /content/gdrive/My Drive/bigfile.tar: Cannot read: Input/output error
tar: Too many errors, quitting
tar: Error is not recoverable: exiting now
It could successfully untar all the files (31GB tar with 10000 files) even yesterday multiple times..
The command I'm using:
!tar -C features -xf /content/gdrive/My Drive/bigfile.tar
Trying to copy the whole tar into the runtime first also timing out:
cp: error reading '/content/gdrive/My Drive/bigfile.tar': Input/output error
I have same problem. I can not read my files on drive. It's sometimes working but mostly giving OSError
OSError: Can't read data (file read failed: time = Mon May 20 00:34:07 2019
, filename = '/content/drive/My Drive/train/trainX_file1', file descriptor = 83, errno = 5, error message = 'Input/output error', buf = 0xc71d3864, total read size = 42145, bytes this sub-read = 42145, bytes actually read = 18446744073709551615, offset = 119840768)
Also creating file giving the OSError.
OSError: Unable to create file (unable to open file: name = '/content/drive/My Drive/train/model.hdf5', errno = 5, error message = 'Input/output error', flags = 13, o_flags = 242)
"https://research.google.com/colaboratory/faq.html#drive-timeout" does not helped me.
I have same problem too. I can't load my data which is not very large, I can load it with num_workers = 1(use PyTorch Dataloader method), but I can't get my files. The number of my files is about 40000. I have tried io.imread or cv2.imread, they all work fine in my own computer, and I am sure that my files are in right place. I can't figure it out for days, I guess it' not my problem. I will try to get image matrix in my own computer and upload by csv format. If this work out, I will feedback.
The link below offers a method, but my files are already in subfolders, maybe it can help you. https://stackoverflow.com/questions/54973331/input-output-error-while-using-google-colab-with-google-drive
Duplicate of #559
I have same issue too. Today, I made voice conversion program in Google Colaboratory. Yesterday it was works. But, today not working since this morning in Japan
I have the same issue. I can't access to a hdf5 file of 42 GB. At some point of my processing pipe comes an OSError, as @furkanyildiz commented. I access each element sequentially and then stored it instantaneously in another .tfrecords file.
I have the same problem. This issue should not be closed. When copying a 20GB file from a mounted Google Drive folder:
!cp 'drive/My Drive/cloud/data/coco_colab2.zip' . && unzip -q coco_colab2.zip
cp: error reading 'drive/My Drive/cloud/data/coco_colab2.zip': Input/output error
Have same problem. I thought the file was corrupted first time, and I downloaded and opened on my local computer it was working fine. Then I uploaded to my brother's account and it was working as well. It is not the problem with the file. I can load other files except that csv file.
Same problem. Working perfectly and then suddenly stops with no changes implemented.
Thanks for confirming. Can you share a minimal self-contained repro notebook, either publicly or just with colaboratory-team@google.com ? (it would be helpful to see precisely how you're reading the data)
Does the problem go away if you first
!cp path/to/data.csv local.csv
and then read from the local path?
I tried this and getting
cp: error reading '/content/drive/My Drive/DSF/file_name.csv': Input/output error
Same problem here as well, reopen the issue.
I made an observation but not tested it. It seems that large files on Google Drive have some daily download limits. Could it be that trying to read from Colab is also counted as a download? If yes, then that explains why it suddenly stops working.
I have no issue downloading to a local machine.
Same problem. I can download to a local machine fine. Downloading to Colab from Google Drive is a nightmare, it takes 5 or 6 tries before it completes successfully.
I think it's a quota problem actually. I can't actually download to a local machine.
@deqncho2 you can test by creating a copy of the file in your Drive and trying to read the new file. That had worked for me hence I had not investigated further but came across this later - https://support.google.com/drive/thread/2035857?hl=en
Same issue. Trying to read a folder with >40k files from gdrive to colab
@MittalShruti , maybe you could try this- https://github.com/googlecolab/colabtools/issues/510#issuecomment-552294940 ?
Or check this thread for details: https://support.google.com/drive/thread/2035857?hl=en
similar issue, reading the csv file through pandas was working fine and suddenly later that day I can't get it into the RAM. First I got this error , ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'. then after using engine='python' I got this:- OSError: [Errno 5] Input/output error
Thanks for confirming. Can you share a minimal self-contained repro notebook, either publicly or just with colaboratory-team@google.com ? (it would be helpful to see precisely how you're reading the data)
Does the problem go away if you first
!cp path/to/data.csv local.csv
and then read from the local path?
No, It didn't work.
any one got the solution? i moved the files to subfolder. and now each subfolder has one file. still i am getting this error
Same problem in colab reading from gdrive >100k file
i noticed that there is some sort of limitation by google. if we access data multiple time from drive, this issue occurs. take a break of 24 hours, and this issue has gone
I am facing the same issue here. Any solution?
Same issue.. I have 4 npy file... 2 files around 10GB and 2 around 6GB..
.
ss = np.load('train_abnormal.npy')
Also,.. I cant open any of those 4 files.
Files with lesser size can be opened though.
I have the same problem too, so I am sure this will affect colab pro too.
I have the same issue when copying a folder of images (around 2000 jpgs)
Found a fix for the error when copying a lot of files. Use:
%cp -av fromfolder tofolder
Works for me
It just happened to me, apparently after doing various file operations (cp, tar, etc) leading to I/O between my gdrive and local colab VM. After this, i got random python OSError and Input/Output error, sometimes even at importing a python module. At another times, my colab notebook just entirely crashed (during a read feather of >1G) and log showed nothing meaningful.
I hope this is just the case of a gdrive daily quota issue as someone mentioned. Has anyone confirm this? I will wait for a day to pass and re-try.
Today it happened to me as well, when trying to unrar 30gb file in colab. I'm getting input/output error. Read error in the file
I also got the same error: 'OSError: [Errno 5] Input/output error' when I was trying to import a 14G-large file from gdrive. This error occurred so suddenly, cuz like several minutes ago I did the same operation and everything was alright. When I tried importing a much smaller file from the same folder, it worked normally. It seems like colab has some limits for importing large files ??? Why this issue closed??? btw, I have subscribed colab pro!!!!
Duplicate of #559
I am also getting the Input/Output error. I downloaded a dataset from kaggle into my drive. It has 50 zip files each having 2000 images. I successfully extracted 2 zip files but then the error started. Any solutions to this?
Same error, I am trying to load 194082 image files from the drive. It worked once, the first time that I extracted the data and tried to load it. It hasn't worked ever since. Even after I updated to colab pro it doesn't work. Frustrating.
same as well,how can I deal with it?
Got the same error
Could not read file
Errno 5] Input/output error: .........
Was working fine until some time back. Had loaded files from a mounted drive
For anyone having this problem with colab + gdrive, the most likely cause is excessive I/O due to large files, or merely "ls -l" on a folder with too many files. The latter case is more harmless (as long as you don't do that again, I find using glob much better behaved). The former case you most likely violate some google quota. In my experience, it is either large size (or extremely large # of small files such that the total is big) or, throughput (i.e. size/time).
The limit for me for a single file seemed to be around 10gb. Mileage seemed to vary. So don't copy file from gdrive <---> colab. Note, it counts as "upload" if you access gdrive file in colab via a Mount
Best solution is to use linux "split" to split your huge files into 500m-1g, and then upload it one by one to gdrive. When you need it in colab, then copy the fragment onto your colab's VM local disk, and then perform a "cat .....". This way, no giant file is ever moved from gdrive. The downside is you have to repeat for every new colab session.
It is a pain, but this whole thing isn't designed for huge scale dataset. Note if you violated and hit I/O Error, you have to wait for 1 day for this to go away. I would try not to do anything at all to your gdrive for at least 24 hrs for this to recover.
Hope this helps.
@kechan The answer would be perfect if you could provide an example of how to do this split with the command line. Thanks anyway for your answer.
I have the same problem too. OSError: [Errno 5] Input/output error: '/content/drive/My Drive/COVID-Net/rsna-pneumonia-detection-challenge/stage_2_train_images/003d8fa0-6bf1-40ed-b54c-ac657f8495c5.dcm'
Will i get the same error with Colab Pro?
I tried using Colab pro and the error does not go. I then segregated my data from all the images in one folder to 50 images per folder. Now it runs however it does not return all the image files. Only 165034 out of 194082. Maybe I need to keep even lesser images per folder. However, this is really frustrating. I like paperspace better now.
Same issue here, It works yesterday, but not now. With the same code. How could?
update: working by copying the files and put in a different directory. Then import the copied one
This an unacceptable flaw in Colab and in my view, completely delegitimizes it as a platform for Machine learning. The ability of a computing platform to handle large amounts of data is absolutely essential in this field, and I think it's just plain crooked of Google to tell people they have a product that is designed for deep learning and ML. I paid for a Pro account and have tried every work around and "OSError: [Errno 5] Input/output error" will always show up again eventually and stop you dead in your tracks. This is not just a "bug", this is the reason you should not use Colab if you have other options.
I feel the same. Even after Colab pro, I had to split my data into various folders and it would partially work. I was so frustrated because I couldn't focus on the project. All my time went in trying to make Colab work.
It's free mate. Take a breath.
@Zappytoes That error is highly likely to do with google drive quota limit than Colab. I have used for Colab for almost 2 years and I found it an excellent platform to experiment with DL on smaller dataset (by modern standard). You are right you shouldn't use Colab if you have other options (i.e. lot of $$). If you work on >10g or more routinely, you should use GCP or AWS and pay the fair price. Pro is only $10? It is the best deal around for the sort of GPU and TPU you get.
@kechan The answer would be perfect if you could provide an example of how to do this split with the command line. Thanks anyway for your answer.
Using Linux "split" to shard a huge file is an old trick you can google around and read far better than i can explain it. Shipping around big file has been an issue since the internet is here. It is only what you mean by "big" that has changed.
Bug report for Colab: http://colab.research.google.com/.