google / skicka

Command-line utility for working with Google Drive. Join the mailing list at https://groups.google.com/forum/#!forum/skicka-users.
Apache License 2.0
1.3k stars 118 forks source link

Panic after rm of large number of files #90

Closed rabeyer closed 9 years ago

rabeyer commented 9 years ago

After executing a "rm -r" command for a large number of folders and files, any command results in a panic. Each time a command is run (the example below is a fsck, but mkdir, ls or any command will do) a different file is shown in the panic. I have tried removing the metadata cache file and letting it rebuild with no difference.


Backup:~ administrator$ skicka fsck --debug /Archive panic: no parents for file &{Path:GEH-OPT_manuscript FileSize:0 Id:0B3e1XeLQJGXcNXY2bTZDc1FiYms Md5: MimeType:application/vnd.google-apps.folder ModTime:2012-06-19 15:57:42 +0000 UTC ParentIds:[] Properties:[{Key:Permissions Value:0755}]} with WIP path .DS_Store but parent path isn't "."?

goroutine 1 [running]: github.com/google/skicka/gdrive.getFilePath(0xc214f1ba20, 0x9, 0xc214f0a560, 0x1c, 0xc20808c0f0, 0xc21adb0a20) /Users/localAdmin/go/src/github.com/google/skicka/gdrive/gdrive.go:700 +0x3c5 github.com/google/skicka/gdrive.(*GDrive).UpdateMetadataCache(0xc20803be90, 0xc20803ba10, 0x2b, 0x0, 0x0) /Users/localAdmin/go/src/github.com/google/skicka/gdrive/gdrive.go:573 +0x6f4 github.com/google/skicka/gdrive.New(0xc20802c9b0, 0x48, 0xc20801d340, 0x18, 0xc20803b920, 0x2c, 0x0, 0x0, 0x4df1e8, 0x6c6d58, ...) /Users/localAdmin/go/src/github.com/google/skicka/gdrive/gdrive.go:344 +0x719 main.main() /Users/localAdmin/go/src/github.com/google/skicka/skicka.go:830 +0x8e2

goroutine 6 [chan receive]: main.func·003() /Users/localAdmin/go/src/github.com/google/skicka/skicka.go:821 +0x4c created by main.main /Users/localAdmin/go/src/github.com/google/skicka/skicka.go:824 +0x81d

goroutine 7 [sleep]: github.com/google/skicka/gdrive.func·003() /Users/localAdmin/go/src/github.com/google/skicka/gdrive/readers.go:86 +0x157 created by github.com/google/skicka/gdrive.launchBandwidthTask /Users/localAdmin/go/src/github.com/google/skicka/gdrive/readers.go:88 +0x190

goroutine 12 [runnable]: net/http.(_persistConn).readLoop(0xc20808a000) /usr/local/go/src/net/http/transport.go:928 +0x9ce created by net/http.(_Transport).dialConn /usr/local/go/src/net/http/transport.go:660 +0xc9f

goroutine 17 [syscall, locked to thread]: runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1

goroutine 13 [select]: net/http.(_persistConn).writeLoop(0xc20808a000) /usr/local/go/src/net/http/transport.go:945 +0x41d created by net/http.(_Transport).dialConn /usr/local/go/src/net/http/transport.go:661 +0xcbc

mmp commented 9 years ago

There is definitely a bug here! For starters to track this down, can you tell me if you have a folder "GEH-OPT_manuscript" anywhere in your Google Drive? (Including in the trash.)

Thanks!

rabeyer commented 9 years ago

Here's what I discovered. Yesterday, I deleted several highly nested folders. They were organized as: GDrive Root-->"Archive"-->Various Client IDs (like GEH, AST,HPT, etc.)-->Product IDs under each client-->Job folders with several additional levels of folder nesting(Art,Manuscript, Active, Inactive, etc.). I deleted 4 Client ID level folders with separate "rm -r" commands.

No client level folders currently display inside the "Archive" folder and nothing shows in the trash. However, if I search for a file contained in the skicka error message path, I can find it. Then, if I traverse the path breadcrumbs at the top, I will either end at a folder that has no further parent that leads to the root (like the client level folder), or get a message pop-up that says: "This folder is in your trash. This folder is in the trash because it is in a folder that was trashed. To view this folder, you'll need to restore the folder containing it."

What appears to have happened is that some of the folders/files have been disassociated from the parent folders but still exist in GDrive as orphans. It seems as if it is a temporary anomaly resulting from the deletion of many tens of thousands of files. Maybe the deletion process is taking days to complete?

It appears that if I search for and find a file this way and then look for it some time later, it isn't found anymore.

It would be great if skicka could work around this issue.

mmp commented 9 years ago

Ok, thanks--that helps a lot! I can reproduce this locally.

I think there are two issues: first, it looks like skicka rm -r isn't doing the right thing to trash children of the top-level directory. (And then second, it's not robust to dealing with state of affairs it leaves behind when it does this.)

I'll dig into both of these issues.

mmp commented 9 years ago

After some investigation, the way that rm -r works is actually correct: it's fine to just trash the top folder in a hierarchy; all its members get carried along.

The issue seems to be that sometimes Drive leaves trashed files as not having any parents (https://polastre.com/2013/02/google-drive-orphaned-files/), even though in theory this isn't supposed to happen.

So the fix, just pushed, is to not worry about it and ignore the file when this happens.

(There had been a panic() in there before since it wasn't clear why this would ever happen, so I didn't want to silently accept it.)

rabeyer commented 9 years ago

Thank you for taking care of this so quickly!