iterative / PyDrive2

Google Drive API Python wrapper library. Maintained fork of PyDrive.
https://docs.iterative.ai/PyDrive2
Other
565 stars 70 forks source link

fix(fs): add dirs caching, make it a bit more robust #322

Closed shcheklein closed 8 months ago

shcheklein commented 8 months ago

Caching mechanics was broken in certain scenarios and affected DVC tremendously. Before we were assuming pre-cached paths for 00, 01 .... ff in DVC. It got broken during DVC 3.0 release, and then got removed completely because of this. Now it's not the case and DVC was kept asking for files, files\md5, etc - all those IDs again and again and again on dvc pull. Pretty much every object could be 3x slower to download.

Among other things, I checked that dvc pull works now even with duplicated root dirs in DVC remote.

Still need to self review this, and next see if we can add tests for all of this. It keeps biting us. Caching is non trivial and we cant be losing the knowledge as we go, every time coming back to fix it after users complain.

shcheklein commented 8 months ago

@efiop @skshetry gentle reminder folks. I think it would be good to merge this to fix the gdrive remote support. When you have time.