vikas5914 / google-photos-backup

Backup photos from Google Photos using Playwright.
MIT License
204 stars 20 forks source link

Metadata taken from html fails #5

Closed markcs closed 9 months ago

markcs commented 11 months ago

Hi,

I had to update your regexp so that the metadata from the html was recognized.

  1. The 'dash' sign is not actually a normal hyphen and so was being ignored by regexp
  2. Some videos and photos had the Landscape or Portrait text before the date (ie aria-label="Video – Portrait – 4 Dec 2013, 06:30:26"
  3. I also added a comment so that I am notified that dates were found in html

Sorry I didn't create a pull request, but thought it was just as easy to post here.

 if (year === 1970 && month === 1) {
    // if metadata is not available, we try to get the date from the html
    console.log('Metadata not found, trying to get date from html')
    const data = await page.request.get(page.url())
    const html = await data.text()
    const regex = /aria-label="(Photo . Landscape|Photo . Portrait|Video . Landscape|Video . Portrait|Video|Photo) . ([^"]+)"/
    const match = regex.exec(html)
    if (match) {
      const dateString = match[2]
      const date = new Date(dateString)

      year = date.getFullYear()
      month = date.getMonth() + 1
      console.log("Found dates in html - Year = ", `${year}`, "Month = ", `${month}` )
    }
  }
Madhurananda commented 10 months ago

Thanks @markcs It worked.

vikas5914 commented 9 months ago

Thanks for the code. I have replaced the regex.