Closed camkhanhdao closed 1 year ago
I have the same issue, also specific to the Shared Drives.
Thanks for the report. Can you share the number of files and folders visible under Shared drives on https://drive.google.com?
I have 16 Shared Drives, and I have quite some folders and files in those Shared Drives, maybe 100s of folders and 1000s of files? Not sure how I could be more precise... I can see that my total storage on Google Drive is less than 10G at least if that helps.
If you can reliably make files/folders disappear, perhaps this will let you quantify the effect:
from google.colab import drive
drive.mount('/gdrive')
!find /gdrive/Shared\ drives/ -ls |sed -e 's/^ *[0-9]* *[0-9]* *//' | sort > /tmp/b1494.before
then do whatever makes files/folders disappear, then run
!find /gdrive/Shared\ drives/ -ls |sed -e 's/^ *[0-9]* *[0-9]* *//' | sort > /tmp/b1494.after
!diff /tmp/b1494.before /tmp/b1494.after
(changes to the number of blocks used for some entries are expected, but we're looking for lines appearing in the first find
invocation that are entirely missing in the second)
Note of course you can restrict the above find command to only some subdir(s) of the "Shared drives" folder for quicker execution and a more minimal repro.
I am experiencing the same issue specific only to shared drives.
If it helps at all, some more details on my end:
@jamesafranke thanks for the detail. That's still not repro'ing for us. Can you create a minimal reproduction notebook demonstrating the issue? (if yes, please remember to either Share it publicly viewable or Share with Viewer permission to colaboratory-team@google.com)
@colaboratory-team - after some more digging it appears it has to do with the permissions. This issue only appears to happen if there is a single user with 'viewer' permissions added to the shared drive, even if you have manager permissions. When all members are managers, the issue does not arise. I have added colaboratory-team@google.com to a shared drive used below for reproducibility. Right now you are a viewer... but I can change you to a manager if that helps.
Working in the file_test.ipynb
file (in Chrome on Mac with manager permissions in the shared drive):
After mounting the drive (run cell 1):
I can see the /content/drive/Shared drives/AWOL_File_Test/Test_Folder/IA_Corn_Harvest_Dists.csv
file in the tree. But as soon as I touch the folder, it disappears and returns a not found error (note, sometimes you are able to access the file 1 time, then it disappears, so you may need to run cell 2 twice).
The file /content/drive/Shared drives/AWOL_File_Test/Test_Folder/file_created_by_james.csv
is still accessible to me in that subfolder because I created it. Another user created the IA_Corn_Harvest_Dists.csv so it disappears.
The user who created this shared drive (and the other files), does not appear to have this same issue because he created the sub-folder. He is able to access files that he did not create. The issue appears to only happen for sub folders in the drive, and those files in the main directory are stable.
Hopefully this helps.
@jamesafranke still not repro'ing... Just to make sure:
drive.mount()
, observes & reads the file on the VM, and then the file disappears from the VMIs that right?
Correct. Except:
drive.mount()
, observes & reads the file on the VM 1 time, and then the file disappears from the VMThe file does not disappear from the VM if U3 is not in the picture.
Thanks! Jim
Still no joy. Let's try a different tack (as U2):
!tar cvzf b1494-logs.tar.gz /root/.config/Google/DriveFS/Logs
Hopefully your logs will help shed some light on this mystery.
Done. File shared with colaboratory-team@google.com via drive as requested.
Thanks! Jim
Got it, need more :) Again, starting from a factory-reset VM, repro the issue, then
!cd $(dirname /root/.config/Google/DriveFS/*/metadata_sqlite_db) && tar cvzf /tmp/msd.tar.gz metadata_sqlite_db*
download the resulting /tmp/msd.tar.gz file, and upload to drive & share to colaboratory-team@google.com.
Done.
Thanks and sorry for the run-around. Looks like there are inter-run variances requiring both pieces of data above from a single invocation. Can you: factory reset, reproduce, then:
!cd /root/.config/Google/DriveFS/ && tar cvzf b1494-both-before.tar.gz */metadata_sqlite_db* Logs/
drive.flush_and_unmount()
!cd /root/.config/Google/DriveFS/ && tar cvzf b1494-both-after.tar.gz */metadata_sqlite_db* Logs/
and share both resulting tar.gz files again?
done.
Still mysterious. Can you add brian@demodemo.org as a Manager on the folder? (to confirm: the non-owner Manager U2 sees this repro, but non-owner Viewer U3 does not, and neither does the owner U1; is that correct?)
Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:
!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)
And see whether the bug still reproduces.
@jamesafranke please see last two comments above.
Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:
!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py from google.colab import drive import importlib _ = importlib.reload(drive)
And see whether the bug still reproduces.
~This fixed things for me, thanks!~ spoke too soon, this reduced the amount of missing files, but there are some still missing
@s22chan if you have the same symptomology as @jamesafranke described in https://github.com/googlecolab/colabtools/issues/1494#issuecomment-691480456 can you also follow https://github.com/googlecolab/colabtools/issues/1494#issuecomment-693765023? Thanks.
Another shot in the dark: after factory-resetting the VM, execute this cell before your repro:
!sed -i -e 's/enforce_single_parent:true/enforce_single_parent:true,metadata_cache_reset_counter:4/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py from google.colab import drive import importlib _ = importlib.reload(drive)
And see whether the bug still reproduces.
Thank you guys so much for all the comments, especially @jamesafranke , due to data policy I cannot share anything from my side.
I ran the above command after Factory reset run time and before mounting the shared drives, it mapped 90% of the shared drives, since I have a huge shared drives, I suppose 90% it's far better than before. Thank you for great support @colaboratory-team
@camkhanhdao when you say "90% of the shared drives" do you mean that only 90% of one shared drive's contents were present, or that only 90% of the total number of shared drives you have were present? (the former is this issue, the latter is a distinct issue; if that's the case, please file a new issue with any details you can share, esp. specific numbers; thanks)
I can't share my files either, but I can give you a bisect via !wget -O /usr/.../drive.py https://raw.githubusercontent.com/.../drive.py
(good*) https://github.com/googlecolab/colabtools/commit/fe964e0e046c12394bae732eaaeda478bc5fa350#diff-dd84f19ca467960c385b44a00328c6c8
(fails to mount) https://github.com/googlecolab/colabtools/commit/a2ee1f23f48817bb895b8c769ad4388d095345b4#diff-dd84f19ca467960c385b44a00328c6c8
(bad) https://github.com/googlecolab/colabtools/commit/9805f77e03cef1664b139c1e857a6cbdcadf9624#diff-dd84f19ca467960c385b44a00328c6c8
update the feb commit still failed, but later (similar to the metadata refresh change), and now it seems like I cannot mount with any old revision (even fe9644) anymore. Might've been a fluke.
@s22chan mounting with old versions of drive.py is unlikely to work, in general. Another idea to try: after factory-resetting the VM, execute this cell before your repro:
!sed -i -e "s/'enforce_single_parent:true',/#'enforce_single_parent:true',/" /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
_ = importlib.reload(drive)
!sed -i -e "s/'enforce_single_parent:true',/#'enforce_single_parent:true',/" /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
nope, that missed about the same amount of files as the baseline
@s22chan thanks for checking. Are you able to create a new Shared drive in which the problem manifests and which you could share with us, or does this only manifest in existing drives for you?
Unfortunately, I tried just creating a site-packages with just tensorflow to mount, and I can't reproduce the issue, even after adding additional viewers and content managers.
I'm having the same issue, sometimes viewers, sometimes managers lose access to files. We have a large group 10+ people and it does not happen to everyone. Seems to be happening more today (might be a coincidence that we're using GPU instances). I shared the drive with colaboratory-team@google.com
@wildintellect please also share with brian@demodemo.org. You mention viewers and managers intermitttently losing access to files; does the owner ever lose access?
I'm the owner and have never experienced the issue. One user found that moving the drive mounting until after all pip
and import
commands seemed to help. Added Brian, here's the code we're using, probably pushing a new version in a few minutes with the pip commands moved lower but the repo is public so we can tag the "bad" version.
@wildintellect can you describe more explicitly what you meant by "lose access to files"?
Your shared drive has (at least) one directory with 20k files in it, so a timeout reading that directory would not be surprising (sorry, this is a shortcoming of the existing Drive integration in colab). Does restructuring that directory to split it up into further subdirectories each of which doesn't have more than, say, 1k files in it, make the problem go away?
Some files at higher levels just vanish, looking at the file browser one level down (data) there's around 11 folders, and maybe 20 files. For some users they see 2 folders and no files - sometimes they see everything (it's all visible in Google Drive directly). I completely get the issue with the 20k file directory and will look into reducing that complexity and report back.
I stand corrected it can happen to an owner, just happened to me. Here's a before and after of the drive folder view, you can see that some files just vanish. Usually I'll notice in the code because I'll get a No such file error when trying to read a file. After error, some files and folders vanish from view/access.
I tried factory runtime resets, opening copies of the notebook. Restarting browser. Sometime it's seems to drop in the middle of reading files. Wonder if the i/o operations causes it.
For anyone still encountering this, please:
!sed -i -e 's/--features=/--module_log_level=sqlite*:LOG_FINE --features=/' /usr/local/lib/python3.6/dist-packages/google/colab/drive.py
from google.colab import drive
import importlib
importlib.reload(drive)
drive.flush_and_unmount()
!cd /root/.config/Google/DriveFS/ && tar cvzf /tmp/b1494-YOURGITHUBUSERNAME.tar.gz */metadata_sqlite_db* Logs/
(obviously, replacing YOURGITHUBUSERNAME with your github username so we can follow up if needed, if you're comfortable with that)
I've just experienced this bug, so I've followed the above instructions. I've shared my b1494-dmusican.tar.gz file with colaboratory-team@google.com. Followup would be great; I'm watching this issue, so I will hopefully see posts here.
We've deployed an update to a system component that has might help with this bug. If you've been affected by this bug, please Runtime -> Factory reset runtime, attempt a repro, and click the reaction emoji corresponding to the result: 🎉 if problems are all gone, 👍 if things appear to be better (fewer missing items but still some missing), 👎 if things are the same.
I am still having this issue and shared the b1494-lytzV.tar.gz with colaboratory-team@google.com. Hopefully will hear back from the team soon :)
@lytzV this bug is for missing entries under "Shared drives", not "My Drive".
Victor Li E.E.C.S | College of Engineering University of California, Berkeley | Class of 2022 (510) 457-5664 | @.*** ᐧ
On Wed, Mar 24, 2021 at 2:29 PM colaboratory-team @.***> wrote:
@lytzV https://github.com/lytzV this bug is for missing entries under "Shared drives", not "My Drive".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/googlecolab/colabtools/issues/1494#issuecomment-806198079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVNTUHXH6SBDJRVKC3WUGLTFJKR7ANCNFSM4P4CIP3Q .
I haven't been able to reproduce the underlying issue, but I believe adding a shortcut is a reliable workaround.
When mounting Google Drive to Google Colab using this code:
the content inside "/content/drive/Shared drives" did not fully present, only few folders and sub-folders is available and those folders were randomly chosen each time the the drive is remount.