ultralytics / flickr_scraper

Simple Flickr Image Scraper
https://ultralytics.com
GNU Affero General Public License v3.0
207 stars 62 forks source link

Order of photos downloaded #25

Open JaimyvS opened 2 months ago

JaimyvS commented 2 months ago

Hello,

Thanks for the great script. It really helps to quickly build datasets for training. But I have a question about the images that are being downloaded.

Are these sorted in a particular order before being downloaded? If I download 50 photos and then download 50 more. Will these be the same photos or are they randomly chosen? Say I later want to add more photos to a category.

Kind regards,

Jaimy van Schelven

pderrenger commented 1 month ago

@JaimyvS hello Jaimy,

Thank you for your kind words about the script! We're glad to hear that it's been helpful for you in building your datasets. 😊

Regarding your question about the order of the images being downloaded, the images are typically fetched based on the criteria set in the script, which can include factors like search keywords, sources, and any applied filters. The order in which images are downloaded can vary depending on these criteria and the source's current state.

If you download 50 photos and then download 50 more, there is a possibility of overlap unless the script is designed to track and avoid duplicates. To ensure you get unique images each time, you might want to implement a mechanism to keep track of already downloaded images or use a source that provides a unique set of images each time.

If you need to add more photos to a category later, you could modify the script to check against a list of already downloaded images to avoid duplicates. Here's a simple example of how you might approach this in Python:

import os

def download_images(category, num_images, downloaded_images):
    new_images = []
    for i in range(num_images):
        image = fetch_image(category)  # Replace with actual image fetching logic
        if image not in downloaded_images:
            new_images.append(image)
            downloaded_images.add(image)
    return new_images

# Example usage
downloaded_images = set(os.listdir('path_to_downloaded_images'))
new_images = download_images('category_name', 50, downloaded_images)

This way, you can maintain a set of already downloaded images and ensure that new downloads are unique.

If you encounter any issues or have further questions, please provide more details or a minimum reproducible code example so we can assist you better. You can find more information on creating a minimum reproducible example here.