googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.21k stars 726 forks source link

Missing files/folders in "Shared drives" when mounting Google Drive to Google Colab #1494

Closed camkhanhdao closed 1 year ago

camkhanhdao commented 4 years ago

When mounting Google Drive to Google Colab using this code:

from google.colab import drive
drive.mount('/content/drive')

the content inside "/content/drive/Shared drives" did not fully present, only few folders and sub-folders is available and those folders were randomly chosen each time the the drive is remount.

ValentinMoullet commented 4 years ago

I have the same issue, also specific to the Shared Drives.

colaboratory-team commented 4 years ago

Thanks for the report. Can you share the number of files and folders visible under Shared drives on https://drive.google.com?

ValentinMoullet commented 4 years ago

I have 16 Shared Drives, and I have quite some folders and files in those Shared Drives, maybe 100s of folders and 1000s of files? Not sure how I could be more precise... I can see that my total storage on Google Drive is less than 10G at least if that helps.

colaboratory-team commented 4 years ago

If you can reliably make files/folders disappear, perhaps this will let you quantify the effect:

from google.colab import drive
drive.mount('/gdrive')
!find /gdrive/Shared\ drives/  -ls |sed -e 's/^ *[0-9]* *[0-9]* *//'  | sort > /tmp/b1494.before

then do whatever makes files/folders disappear, then run

!find /gdrive/Shared\ drives/  -ls |sed -e 's/^ *[0-9]* *[0-9]* *//'  | sort > /tmp/b1494.after
!diff /tmp/b1494.before /tmp/b1494.after

(changes to the number of blocks used for some entries are expected, but we're looking for lines appearing in the first find invocation that are entirely missing in the second)

Note of course you can restrict the above find command to only some subdir(s) of the "Shared drives" folder for quicker execution and a more minimal repro.

jamesafranke commented 4 years ago

I am experiencing the same issue specific only to shared drives.

If it helps at all, some more details on my end:

colaboratory-team commented 4 years ago

@jamesafranke thanks for the detail. That's still not repro'ing for us. Can you create a minimal reproduction notebook demonstrating the issue? (if yes, please remember to either Share it publicly viewable or Share with Viewer permission to colaboratory-team@google.com)

jamesafranke commented 4 years ago

@colaboratory-team - after some more digging it appears it has to do with the permissions. This issue only appears to happen if there is a single user with 'viewer' permissions added to the shared drive, even if you have manager permissions. When all members are managers, the issue does not arise. I have added colaboratory-team@google.com to a shared drive used below for reproducibility. Right now you are a viewer... but I can change you to a manager if that helps.

Working in the file_test.ipynb file (in Chrome on Mac with manager permissions in the shared drive):

After mounting the drive (run cell 1):

Screen Shot 2020-09-11 at 11 22 36 AM

I can see the /content/drive/Shared drives/AWOL_File_Test/Test_Folder/IA_Corn_Harvest_Dists.csv file in the tree. But as soon as I touch the folder, it disappears and returns a not found error (note, sometimes you are able to access the file 1 time, then it disappears, so you may need to run cell 2 twice).

Screen Shot 2020-09-11 at 11 23 25 AM

The file /content/drive/Shared drives/AWOL_File_Test/Test_Folder/file_created_by_james.csv is still accessible to me in that subfolder because I created it. Another user created the IA_Corn_Harvest_Dists.csv so it disappears.

The user who created this shared drive (and the other files), does not appear to have this same issue because he created the sub-folder. He is able to access files that he did not create. The issue appears to only happen for sub folders in the drive, and those files in the main directory are stable.

Hopefully this helps.

colaboratory-team commented 4 years ago

@jamesafranke still not repro'ing... Just to make sure:

Is that right?

jamesafranke commented 4 years ago

Correct. Except:

The file does not disappear from the VM if U3 is not in the picture.

Thanks! Jim

colaboratory-team commented 4 years ago

Still no joy. Let's try a different tack (as U2):

Hopefully your logs will help shed some light on this mystery.

jamesafranke commented 4 years ago

Done. File shared with colaboratory-team@google.com via drive as requested.

Thanks! Jim

colaboratory-team commented 4 years ago

Got it, need more :) Again, starting from a factory-reset VM, repro the issue, then

!cd $(dirname /root/.config/Google/DriveFS/*/metadata_sqlite_db) && tar cvzf /tmp/msd.tar.gz metadata_sqlite_db*

download the resulting /tmp/msd.tar.gz file, and upload to drive & share to colaboratory-team@google.com.

jamesafranke commented 4 years ago

Done.

colaboratory-team commented 4 years ago

Thanks and sorry for the run-around. Looks like there are inter-run variances requiring both pieces of data above from a single invocation. Can you: factory reset, reproduce, then:

!cd /root/.config/Google/DriveFS/ && tar cvzf b1494-both-before.tar.gz */metadata_sqlite_db* Logs/
drive.flush_and_unmount()
!cd /root/.config/Google/DriveFS/ && tar cvzf b1494-both-after.tar.gz */metadata_sqlite_db* Logs/

and share both resulting tar.gz files again?

jamesafranke commented 4 years ago

done.

colaboratory-team commented 4 years ago

Still mysterious. Can you add brian@demodemo.org as a Manager on the folder? (to confirm: the non-owner Manager U2 sees this repro, but non-owner Viewer U3 does not, and neither does the owner U1; is that correct?)

colaboratory-team commented 4 years ago

Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:

!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)

And see whether the bug still reproduces.

colaboratory-team commented 4 years ago

@jamesafranke please see last two comments above.

s22chan commented 4 years ago

Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:

!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)

And see whether the bug still reproduces.

~This fixed things for me, thanks!~ spoke too soon, this reduced the amount of missing files, but there are some still missing

colaboratory-team commented 4 years ago

@s22chan if you have the same symptomology as @jamesafranke described in https://github.com/googlecolab/colabtools/issues/1494#issuecomment-691480456 can you also follow https://github.com/googlecolab/colabtools/issues/1494#issuecomment-693765023? Thanks.

camkhanhdao commented 4 years ago

Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:

!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)

And see whether the bug still reproduces.

Thank you guys so much for all the comments, especially @jamesafranke , due to data policy I cannot share anything from my side.
I ran the above command after Factory reset run time and before mounting the shared drives, it mapped 90% of the shared drives, since I have a huge shared drives, I suppose 90% it's far better than before. Thank you for great support @colaboratory-team

colaboratory-team commented 4 years ago

@camkhanhdao when you say "90% of the shared drives" do you mean that only 90% of one shared drive's contents were present, or that only 90% of the total number of shared drives you have were present? (the former is this issue, the latter is a distinct issue; if that's the case, please file a new issue with any details you can share, esp. specific numbers; thanks)

s22chan commented 4 years ago

I can't share my files either, but I can give you a bisect via !wget -O /usr/.../drive.py https://raw.githubusercontent.com/.../drive.py (good*) https://github.com/googlecolab/colabtools/commit/fe964e0e046c12394bae732eaaeda478bc5fa350#diff-dd84f19ca467960c385b44a00328c6c8 (fails to mount) https://github.com/googlecolab/colabtools/commit/a2ee1f23f48817bb895b8c769ad4388d095345b4#diff-dd84f19ca467960c385b44a00328c6c8 (bad) https://github.com/googlecolab/colabtools/commit/9805f77e03cef1664b139c1e857a6cbdcadf9624#diff-dd84f19ca467960c385b44a00328c6c8

update the feb commit still failed, but later (similar to the metadata refresh change), and now it seems like I cannot mount with any old revision (even fe9644) anymore. Might've been a fluke.

colaboratory-team commented 4 years ago

@s22chan mounting with old versions of drive.py is unlikely to work, in general. Another idea to try: after factory-resetting the VM, execute this cell before your repro:

!sed -i -e "s/'enforce_single_parent:true',/#'enforce_single_parent:true',/" /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)
s22chan commented 4 years ago

!sed -i -e "s/'enforce_single_parent:true',/#'enforce_single_parent:true',/" /usr/local/lib/python3.6/dist-packages/google/colab/drive.py

nope, that missed about the same amount of files as the baseline

colaboratory-team commented 4 years ago

@s22chan thanks for checking. Are you able to create a new Shared drive in which the problem manifests and which you could share with us, or does this only manifest in existing drives for you?

s22chan commented 4 years ago

Unfortunately, I tried just creating a site-packages with just tensorflow to mount, and I can't reproduce the issue, even after adding additional viewers and content managers.

wildintellect commented 4 years ago

I'm having the same issue, sometimes viewers, sometimes managers lose access to files. We have a large group 10+ people and it does not happen to everyone. Seems to be happening more today (might be a coincidence that we're using GPU instances). I shared the drive with colaboratory-team@google.com

colaboratory-team commented 4 years ago

@wildintellect please also share with brian@demodemo.org. You mention viewers and managers intermitttently losing access to files; does the owner ever lose access?

wildintellect commented 4 years ago

I'm the owner and have never experienced the issue. One user found that moving the drive mounting until after all pip and import commands seemed to help. Added Brian, here's the code we're using, probably pushing a new version in a few minutes with the pip commands moved lower but the repo is public so we can tag the "bad" version.

colaboratory-team commented 4 years ago

@wildintellect can you describe more explicitly what you meant by "lose access to files"?

Your shared drive has (at least) one directory with 20k files in it, so a timeout reading that directory would not be surprising (sorry, this is a shortcoming of the existing Drive integration in colab). Does restructuring that directory to split it up into further subdirectories each of which doesn't have more than, say, 1k files in it, make the problem go away?

wildintellect commented 4 years ago

Some files at higher levels just vanish, looking at the file browser one level down (data) there's around 11 folders, and maybe 20 files. For some users they see 2 folders and no files - sometimes they see everything (it's all visible in Google Drive directly). I completely get the issue with the 20k file directory and will look into reducing that complexity and report back.

wildintellect commented 4 years ago

I stand corrected it can happen to an owner, just happened to me. Here's a before and after of the drive folder view, you can see that some files just vanish. Usually I'll notice in the code because I'll get a No such file error when trying to read a file. Screenshot from 2020-10-06 15-24-34 After error, some files and folders vanish from view/access. Screenshot from 2020-10-06 15-22-57

I tried factory runtime resets, opening copies of the notebook. Restarting browser. Sometime it's seems to drop in the middle of reading files. Wonder if the i/o operations causes it.

colaboratory-team commented 4 years ago

For anyone still encountering this, please:

(obviously, replacing YOURGITHUBUSERNAME with your github username so we can follow up if needed, if you're comfortable with that)

dmusican commented 4 years ago

I've just experienced this bug, so I've followed the above instructions. I've shared my b1494-dmusican.tar.gz file with colaboratory-team@google.com. Followup would be great; I'm watching this issue, so I will hopefully see posts here.

colaboratory-team commented 4 years ago

We've deployed an update to a system component that has might help with this bug. If you've been affected by this bug, please Runtime -> Factory reset runtime, attempt a repro, and click the reaction emoji corresponding to the result: 🎉 if problems are all gone, 👍 if things appear to be better (fewer missing items but still some missing), 👎 if things are the same.

lytzV commented 3 years ago

I am still having this issue and shared the b1494-lytzV.tar.gz with colaboratory-team@google.com. Hopefully will hear back from the team soon :)

colaboratory-team commented 3 years ago

@lytzV this bug is for missing entries under "Shared drives", not "My Drive".

lytzV commented 3 years ago

I just added the shared folder as a shortcut to my drive.

Victor Li E.E.C.S | College of Engineering University of California, Berkeley | Class of 2022 (510) 457-5664 | @.*** ᐧ

On Wed, Mar 24, 2021 at 2:29 PM colaboratory-team @.***> wrote:

@lytzV https://github.com/lytzV this bug is for missing entries under "Shared drives", not "My Drive".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/googlecolab/colabtools/issues/1494#issuecomment-806198079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVNTUHXH6SBDJRVKC3WUGLTFJKR7ANCNFSM4P4CIP3Q .

cperry-goog commented 1 year ago

I haven't been able to reproduce the underlying issue, but I believe adding a shortcut is a reliable workaround.