practical-data-science / ShopifyScraper

Shopify Scraper package to extract all products from a Shopify site and return them in a Pandas dataframe.
28 stars 5 forks source link

Suggestion: Implementing a _get_images function #1

Closed lazarost closed 2 years ago

lazarost commented 2 years ago

First of all, thank you for this great software. Much easier and more straightforward than using the official Shopify python API.

Let me preface that I am not a programmer or developer and my knowledge in Python is basically zero.

I am trying to create a script to produce a xml file for use in a shopping comparison site and having some trouble getting the product images source value out.

From looking at the code, I thought duplicating the _get_variants function as _get_images in scraper.py would work, since both variants and images are arrays (I think) but it does not seem to work for me.

What I would like to have for use in my script would be something along these lines:

images = scraper.get_images(parents)

images.iloc[X].src_img0 = https://{url}/first-image.jpg images.iloc[X].src_img1 = https://{url}/second-image.jpg for as many images a product has.

Sample code of my script:

f = open('output.xml', 'w')
for i in range(len(parents)):
    f.write('            '+'<id>'+ str(parents.iloc[i].id) + '</id>'+"\n")
    f.write('            '+'<title>'+ parents.iloc[i].title + '</title>'+"\n")

etc for the other entries needed and I am missing the values to add in the following piece of code:

all products have at least 1 image

f.write('            '+'<image>'+ images.iloc[i].src_img0 + '</image>'+"\n")

should be a better way to declare this besides multiple ifs

if str(image.iloc[i].src_img1).startswith('https://') is True:
    f.write('            '+'<additional_imageurl>'+ image.iloc[i].src_img1 + '</additional_imageurl>'+"\n")

Any help would be greatly appreciated.

Thank you beforehand, Lazaros

flyandlure commented 2 years ago

Hi Lazaros, Thanks for your message. Glad you find it useful.

Images are already returned by the get_products() function and are stored in a column called images. The data Shopify returns is a list of JSON objects. JSON data can be quite fiddly to work with but the basic aim would be to iterate over the images column for each product, then loop through the list of values stored in the images list, then extract the JSON values and write the values into a new dataframe.

I might add a new function to handle this if I get time.

Cheers, Matt

flyandlure commented 2 years ago

Hi Lazaros, I've added this function to version 0.002. If you update the package it will be ready to use. Here's how to use it:

from shopify_scraper import scraper

url = "https://yourshopifydomain.com"

parents = scraper.get_products(url)
parents.to_csv('parents.csv', index=False)
print('Parents: ', len(parents))

children = scraper.get_variants(parents)
children.to_csv('children.csv', index=False)
print('Children: ', len(children))

images = scraper.get_images(parents)
images.to_csv('images.csv', index=False)
print('Images: ', len(images))