vikas5914 / google-photos-backup

Backup photos from Google Photos using Playwright.
MIT License
218 stars 23 forks source link

Timeout #6

Open calz1 opened 1 year ago

calz1 commented 1 year ago

Super cool project! I am looking forward to using this versus processing one of the Google takeouts.

I got about halfway through thousands of photos and ran into an issue. I have read Issue #4 .

I immediately run this and get:

image

vikas5914 commented 1 year ago

@calz1 Can you run with node index.js --headless=false? This will open the browser and show what's happening and why the error is coming.

calz1 commented 1 year ago

Good idea (but I guess you wrote it ;) )

When doing that, it starts up, brings up Google Photos, opens a photo I recognize, then looks like this:

image

I tried View Page Source, but it is grayed out. I tried clicking on the file in the lower left and going to "Show in Folder" to see if there was anything in it, but it says "Removed" underneath.

calz1 commented 1 year ago

I think I found something! On a whim, I figured I would check and see if there was a really large video or something that it was trying to download and taking awhile. I knew what date it was near because of the year/month folders and I knew what the last successful photo was (a bike trailer) from what was in the folder. I scrolled there and it almost looks like I have a couple corrupt photos in my Google Photos timeline. They have a filename but just display that exclamation logo.

I deleted them and now it is proceeding again. I am going to see if I can retrieve them from an old backup. Perhaps there is a way to detect these corrupt ones so it doesn't halt?

image

calz1 commented 1 year ago

I had a couple more of those apparently corrupt photos as indicated by the exclamation mark. I deleted them and it made it several more months, though now I think I encountered a slightly different problem with a corrupt video. Even though it was uploaded over a decade ago, Google thinks it is still processing and won't let me download. I am going to delete it but it would be cool if these were skipped or threw a warning.

image

vikas5914 commented 1 year ago

It will be tough since I don't have corrupt videos or images in my Google photo. So I will not be able to test thoroughly. However, I will add some logic.

calz1 commented 1 year ago

Thank you! I've encountered a couple more that caused it to freeze and had delete them. Some would even display in the GUI but wouldn't download, so I am not sure what is going on. I have been using Google Photos since it was Picasa Web, so I guess there has been opportunity for different upload methods...

dajhorn commented 1 year ago

I can confirm this glitch. I'm getting it with GIF files that were uploaded way-back through Picasa and other non-Google-Photos tools.

$ node index.js --headless=false
Starting from: https://photos.google.com/archive/photo/AF1QipNnjf_jzMTNLtzg4AQ5zFLRCRJZV9JoOVW6Kndc
Latest Photo: https://photos.google.com/photo/AF1QipNzt30QkiAqEmXhzx9cFnXlKhi7QhdDSw6mcVAw
-------------------------------------
Metadata not found, trying to get date from html
Download Complete: 1916/4/1916-04-03 Attestation Paper of William Earl Motley Back (508195b).gif
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

page.waitForURL: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for navigation until "load"
============================================================
    at file:///home/dajhorn/google-photos-backup/index.js:79:16 {
  name: 'TimeoutError'
}

Node.js v18.13.0

The chromium instance is crashing and popping the "Restore pages? Chromium didn't shut down correctly" dialog when restarted.

vikas5914 commented 1 year ago

@dajhorn @calz1, can you make that image sharable and give me access? So I can implement how to skip them.

calz1 commented 1 year ago

@vikas5914 Sure, is the email on your GitHub profile good?

dajhorn commented 1 year ago

https://photos.app.goo.gl/qeBT5NkNyjicsvhM8 https://photos.app.goo.gl/ArDEBxnKfK4Cb1zv7

BTW, my system is Ubuntu 23.04 Lunar Lobster with the google-photos-backup HEAD installed according to the README.md page.

vikas5914 commented 1 year ago

@calz1, it shows the album is empty.

@dajhorn I will check the .gif issue.

calz1 commented 1 year ago

Huh, both ways it is empty? Here's what I see:

image

vikas5914 commented 1 year ago

One question @calz1 @dajhorn: Do you see the Next/Previous button icons at the GIF or the broken image URL?

@calz1 Yeah, it shows blank. image

calz1 commented 1 year ago

I see Next/Previous buttons when clicking on the broken image in the main list of photos. They work and advance to the next image (which does work).

image

dajhorn commented 1 year ago

Do you see the Next/Previous button icons at the GIF or the broken image URL?

On each page, I always see the Next/Previous button, and I never see the broken image icon.

dakahler commented 1 year ago

I'm also seeing this same timeout on an old mp4. It plays fine in Photos, but when the script tries to access it, the browser in headful mode reports:

This video-downloads.googleusercontent.com page can’t be found

No webpage was found for the web address: https://video-downloads.googleusercontent.com/snip?authuser=0 HTTP ERROR 404

I can send the video privately if that would help.

vikas5914 commented 1 year ago

@dakahler, what happens when you try to download it manually?

dakahler commented 1 year ago

If I select the video and go to download, it works fine.

Also tried clicking the right arrow from the previous photo and pressing Shift+D since that's closer to what the code does, and still works ok.

Odd that the code clicks "left" to get to the video, but the video is a right click with how I have it sorted (newest to oldest). Maybe that has something to do with it.

Also, when I look at the actual download URL for the video, it's completely different than the one that 404s.

dakahler commented 1 year ago

Wild, there's a broken link to... something... that only shows up on the chromium browser launched by the tool. Doesn't show up on regular Chrome, Firefox, or latest Chromium.

vikas5914 commented 1 year ago

@dakahler Got it, I had the same assumption. At this moment, I'm investigating whether we can utilize the installed Chrome instead of "Chromium."

dakahler commented 1 year ago

Switching to Firefox does get past this particular issue, though it has some other issues.

vikas5914 commented 1 year ago

@dakahler @dajhorn @calz1 Please check the latest code. It will try to download, and on the error, it will skip that URL. (as long as it has a left arrow).

It will also use the installed chrome instead of the Chorium browser.

dajhorn commented 1 year ago

The current HEAD gives me a different error:

$ node index.js
Starting from: https://photos.google.com/archive/photo/AF1QipOp0lDRrqBYCEnNtx76gdwvORetDC5NfUu7KwBs
Latest Photo: https://photos.google.com/photo/AF1QipOzVVpRoZ1rs_My0CZe-itlGcRlRE8tuI7CRzM7
-------------------------------------
Metadata not found, trying to get date from html
Download Complete: NaN/NaN/1916-04-03 Attestation Paper of William Earl Motley Front (508195a).gif
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

page.waitForURL: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for navigation until "load"
============================================================
    at file:///home/dajhorn/src/google-photos-backup/index.js:80:16 {
  name: 'TimeoutError'
}

Node.js v18.13.0
vikas5914 commented 1 year ago

@dajhorn, Can you run as node index.js --headless=false so You can see whats happeneing at error.? Also, why is there an archive in your URL?

dajhorn commented 1 year ago

Also, why is there an archive in your URL?

Most of my photographs are archived and available only by clicking the Archive -> Library in the left pane.

Can you run as node index.js --headless=false so You can see whats happeneing at error.?

I'm getting these differences in behavior between invocations:

1. node index.js --headless=false

The first image is successfully downloaded, but the left and right arrows do not appear and the second image is never loaded.

Screenshot - Playwright

2. Pasting the image URL directly into a new interactive browser instance.

The left and right arrows do not appear for old images, and I cannot change images by using the arrow keys.

Screenshot - Interactive

3. Opening Google Photos Archive interactively.

My archive is so large that it takes several minutes for the timeline to populate; much longer than most people will wait before assuming an error. The archive library page contains only grey boxes and looks like this while the timeline is loading:

Screenshot - Archive Library Loading

If I wait until the oldest images appear in the archive timeline, and then click an old image, then the left and right buttons appear and the arrow keys work too.

Screenshot - Timeline Loaded - Arrows Appear

My guess is, therefore, that this failure mode is somehow related to the total number of images in a Google Photos account.

dajhorn commented 1 year ago

☝️ After downloading my oldest photograph from a cold start, Playwright/Chromium must wait seven minutes for Google Photos to return a link to my second-oldest photograph.

Google Photos returns a link to my third-oldest photograph in less than five seconds and runs much faster thereafter.

Interactively, I get proper behavior if I do this:

  1. Open Google Photos Archive.
  2. Mouseover the right edge of the frame so that the timeline appears.
  3. Click anywhere in the timeline so that it gets focus.
  4. Push the End key so that the Archive pane scrolls to the last image.
  5. Wait until the thumbnail of the last image is loaded.
  6. Click the thumbnail of the last image.
  7. Left-right / older-newer / previous-next now work as expected for all images and movies in the account.
vikas5914 commented 1 year ago

@dajhorn I have not tried with the archive folder. This project was only tested with the direct photo we see when we open Google Photos.

I will check with the archive and see if there are other ways to fix this.