haltakov / simple-photo-gallery

Beautiful and simple photo galleries that help you tell your story. Free and open-source.
https://haltakov.net/simple-photo-gallery
MIT License
194 stars 51 forks source link

Google Photos Limit #134

Open r0bgus opened 6 months ago

r0bgus commented 6 months ago

When creating a gallery from google drive, the current xpath is limited to 30 images. It seems like google photos only loads up to 30 photos at any time.

I created a POC for the GoogleGalleryLogic class. It will scroll the gallery and capture every available photo.

class GoogleGalleryLogic(BaseGalleryLogic):

    photos = {}

    def create_thumbnails(self, force=False):
        """
        This function doesn't do anything, because the thumbnails are links to OneDrive
        :param force: Forces generation of thumbnails if set to true
        """
        pass

    def is_scroll_bottom(self,driver):
        return driver.execute_script("return document.querySelector('c-wiz[id]').scrollHeight-document.querySelector('c-wiz[id]').scrollTop-document.querySelector('c-wiz[id]').getBoundingClientRect().height == 0;")

    # Function to scroll down the page using JavaScript
    def scroll_down(self,driver):
        driver.execute_script(f"document.querySelector('c-wiz[id]').scrollBy(0, document.querySelector('c-wiz[id]').getBoundingClientRect().height);")

    def store_photos(self,new_photos):

        for new_photo in new_photos: 
            photo_url = new_photo.get_attribute("data-latest-bg")
            photo_base_url, photo_name = parse_photo_link(photo_url)
            self.photos[photo_url] = {
                "photo_base_url": photo_base_url,
                "photo_name": photo_name
            }

    def generate_images_data(self, images_data):
        """
        Parse the remote link and extract link to the images and the thumbnails
        :param images_data: Images data dictionary containing the existing metadata of the images and which will be
        updated by this function
        :return updated images data dictionary
        """

        # Get the path to the Firefox webdriver
        webdriver_path = pkg_resources.resource_filename(
            "simplegallery", "bin/geckodriver"
        )

        # Configure the driver in headless mode
        options = Options()
        #options.headless = True
        options.add_argument("--width=1920")
        options.add_argument("--height=1500")
        spg_common.log(f"Starting Firefox webdriver...")
        driver = webdriver.Firefox(options=options, executable_path=webdriver_path)

        # Load the album page
        spg_common.log(f'Loading album from {self.gallery_config["remote_link"]}...')
        driver.get(self.gallery_config["remote_link"])

        wait = WebDriverWait(driver, 10)
        wait.until(EC.presence_of_all_elements_located((By.XPATH, '//div[@data-latest-bg]')))

        # Scroll until reaching the total scroll height
        while not self.is_scroll_bottom(driver):

            # Capture new elements
            new_elements = driver.find_elements(By.XPATH, '//div[@data-latest-bg]')

            # Add new elements to the seen set
            self.store_photos(new_elements)

            # Output information
            print(f"Scrolled, New Elements: {len(new_elements)}, Total Elements: {len(self.photos)}")

            # Scroll down
            self.scroll_down(driver)

            # Wait for new elements to load
            #could use a proper element check here instead of arbitrary wait
            time.sleep(1)

        spg_common.log(f"Photos found: {len(self.photos)}")
        current_photo = 1
        for photo_url in self.photos:
            #photo_url = photo.get_attribute("data-latest-bg")
            #photo_base_url, photo_name = parse_photo_link(photo_url)

            photo_base_url = self.photos[photo_url]['photo_base_url']
            photo_name = self.photos[photo_url]["photo_name"]

            spg_common.log(
                f"{current_photo}/{len(self.photos)}\t\tProcessing photo {photo_name}: {photo_url}"
            )
            current_photo += 1

            if "http" not in photo_url:
                continue

            # Compute photo and thumbnail sizes
            photo_link_max_size = f"{photo_base_url}=w9999-h9999-no"
            size = spg_media.get_remote_image_size(photo_link_max_size)
            thumbnail_size = spg_media.get_thumbnail_size(
                size, self.gallery_config["thumbnail_height"]
            )

            # Add the photo to the images_data dict
            images_data[photo_name] = dict(
                description="",
                mtime=time.time(),
                size=size,
                src=f"{photo_base_url}=w{size[0]}-h{size[1]}-no",
                thumbnail=f"{photo_base_url}=w{thumbnail_size[0]}-h{thumbnail_size[1]}-no",
                thumbnail_size=thumbnail_size,
                type="image",
            )

        spg_common.log(f"All photos processed!")

        driver.quit()

        return images_data 
kroesche commented 6 months ago

I tried to test spg using a google photo gallery and I am getting an error

Something went wrong while generating the images_data.json file: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

which i traced to an update to selenium in June 2023. Do you not see this error when you use spg with a google photo album? are you using an older version of selenium?

r0bgus commented 6 months ago

selenium 4.0.0b4 geckodriver v0.33 firefox v120 (v121 had some issues with selenium/geckodriver)

kroesche commented 6 months ago

after i downgraded selenium, then i was getting error traced to geckodriver. I see you are using a more recent version than is bundled with this package. did you just update geckodriver after you installed spg or ... ? sorry for so many questions. I want to fix this but I am learning how this package works and want to make sure i understand completely so i know the proper approach to fixing. thanks.

r0bgus commented 6 months ago

I did update it after installing spg. It's important you get the right combination of browser version and driver specific to your environment.

kroesche commented 6 months ago

I opened new issue #135 to address Selenium issue.

I put your proposed change on a branch issue/134-google-limit to make it easier to review and test. I havent tried to test it yet. It is based on the branch for #135 because that is needed anyway to make SPG work with remote albums.