perkeep / gphotos-cdp

This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos.
Apache License 2.0
651 stars 36 forks source link

more than one file (2) in download dir: be lenient #29

Open karan opened 3 years ago

karan commented 3 years ago

Moving from https://github.com/JakeWharton/docker-gphotos-sync/issues/38:

$ docker run --name gphotos-cdp-sync --rm -v /gphotos-sync:/tmp/gphotos-cdp -v /dl/cdp-gphotos-sync:/download jakewharton/gphotos-sync:0.3.1 /app/sync.sh

...snip
2020/11/15 21:26:24 Running /app/fix_time.sh on /download/AF1QipBizz_jZXshGM1QuBcdxkMmiP-_EDMi7eLjKxHY/DSC00874.JPG
/download/AF1QipBizz_jZXshGM1QuBcdxkMmiP-_EDMi7eLjKxHY/DSC00874.JPG
2020/11/15 21:26:25 Event: {Type:keyDown Modifiers:Shift Timestamp:<nil> Text: UnmodifiedText: KeyIdentifier: Code:KeyD Key:D WindowsVirtualKeyCode:68 NativeVirtualKeyCode:68 AutoRepeat:false IsKeypad:false IsSystemKey:false Location:0}
2020/11/15 21:26:25 Event: {Type:keyUp Modifiers:Shift Timestamp:<nil> Text: UnmodifiedText: KeyIdentifier: Code:KeyD Key:D WindowsVirtualKeyCode:68 NativeVirtualKeyCode:68 AutoRepeat:false IsKeypad:false IsSystemKey:false Location:0}
2020/11/15 21:26:25 ERROR: unhandled page event *page.EventDownloadWillBegin
2020/11/15 21:26:26 more than one file (2) in download dir "/download"
[cmd] /app/sync.sh exited 1
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.

This seems to be happening after a few photos are downloaded.

And the files:

-rwxr-xr-x 1 user grp       0 Nov 15 13:26 DSC00853.JPG*
-rwxr-xr-x 1 user grp 3009501 Nov 15 13:26 DSC00853.JPG.crdownload*
-rwxr-xr-x 1 user grp      76 Nov 15 13:26 .lastdone*

Reading the code, the cdp code will freak out if it sees more than 1 file in the directory (as the error says).

I put a watch on my download dir, and here's what I can tell is happening:

  1. Chrome starts the download. File name is 100_1291.JPG.crdownload.
  2. Chrome finishes the download. File is copied in to 100_1291.JPG.

For that brief period, both files exist. If you get unlucky, the if check can happen during that period, causing the container to die.

I think this can be resolved by one of:

  1. ignore .crdownload files.
  2. if >1 files, continue after waiting a beat (and let the disk state reconcile
mpl commented 3 years ago

hmm, I wonder why you seem to be the only one to have .crdownload files. Of course I have seen it happening when downloading things manually on Chrome, but I've never seen it happen with the API. Anyway, I suppose it's ok to make the check ignore .crdownload files for now. Are you interested in proposing a PR?

czheo commented 3 years ago

Why do we need this check at the first place?

excerebrose commented 3 years ago

@mpl I also seemed to notice the same issue along with 'DS_Store' Hence a quick patch:

    if strings.HasSuffix(v.Name(), ".crdownload") {
                continue
            }
            if v.Name() == ".DS_Store" {
                continue
            }
danielverkamp commented 3 years ago

I'm seeing the same issue, although I think the sequence of events is slightly different. It seems like ioutil.ReadDir() is just observing two different names for the same underlying file, although I don't think both actually exist simultaneously. At least in my case, downloading onto ext4 on Linux, both the .crdownload file and the completed download have the exact same inode number and a link count of 1 (checked by printing out fileEntries in the "more than one file" error message), so I suspect it is just being renamed, not copied. In any case, this seems to be a valid behavior of readdir, even if renames are atomic on the filesystem in question (for example, see https://yarchive.net/comp/linux/readdir_nonatomicity.html).

I hacked around it in my local copy by just changing the check to accept 2 files instead of 1 in the download directory, but this doesn't seem very robust. I also considered just filtering out .crdownload files in the loop before the "more than one file" check, but this prevents the code below that from extending the deadline when a download is in progress.

Maybe allowing two entries if one's filename is equal to the other's filename + ".crdownload" would be best? That would make the code a bit clunky, though.

gc-ss commented 2 years ago

Is this a "harmless" race condition?

Can I just restart this after it throws this specific error without concern of data corruption?

andyseubert commented 2 years ago

I made a watchdog script which I run every 15 minutes to re-run the sync app since it stops almost every time with this error. I'm not sure it's helping yet

if ! pgrep -f chrome &> /dev/null 2>&1; then
  # chrome is not running
  /app/sync.sh
fi