practical-data-science / ecommercetools

EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
MIT License
240 stars 48 forks source link
customer customers ecommerce marketing marketing-analytics marketing-tools retail seo seo-optimization seotools

EcommerceTools

EcommerceTools

EcommerceTools is a data science toolkit for those working in technical ecommerce, marketing science, and technical seo and includes a wide range of features to aid analysis and model building. The package is written in Python and is designed to be used with Pandas and works within a Jupyter notebook environment or in standalone Python projects.

Installation

You can install EcommerceTools and its dependencies via PyPi by entering pip3 install ecommercetools in your terminal, or !pip3 install ecommercetools within a Jupyter notebook cell.


Modules

Transactions

  1. Load sample transaction items data

If you want to get started with the transactions, products, and customers features, you can use the load_sample_data() function to load a set of real world data. This imports the transaction items from widely-used Online Retail dataset and reformats it ready for use by EcommerceTools.

from ecommercetools import utilities

transaction_items = utilities.load_sample_data()
transaction_items.head()
order_id sku description quantity order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
  1. Create a transaction items dataframe

The utilities module includes a range of tools that allow you to format data, so it can be used within other EcommerceTools functions. The load_transaction_items() function is used to create a Pandas dataframe of formatted transactional item data. When loading your transaction items data, all you need to do is define the column mappings, and the function will reformat the dataframe accordingly.

import pandas as pd
from ecommercetools import utilities

transaction_items = utilities.load_transaction_items('transaction_items_non_standard_names.csv',
                                 date_column='InvoiceDate',
                                 order_id_column='InvoiceNo',
                                 customer_id_column='CustomerID',
                                 sku_column='StockCode',
                                 quantity_column='Quantity',
                                 unit_price_column='UnitPrice'
                                 )
transaction_items.to_csv('transaction_items.csv', index=False)
print(transaction_items.head())
order_id sku description quantity order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
  1. Create a transactions dataframe

The get_transactions() function takes the formatted Pandas dataframe of transaction items and returns a Pandas dataframe of aggregated transaction data, which includes features identifying the order number.

import pandas as pd
from ecommercetools import customers

transaction_items = pd.read_csv('transaction_items.csv')
transactions = transactions.get_transactions(transaction_items)
transactions.to_csv('transactions.csv', index=False)
print(transactions.head())
order_id order_date customer_id skus items revenue replacement order_number
0 536365 2010-12-01 08:26:00 17850.0 7 40 139.12 0 1
1 536366 2010-12-01 08:28:00 17850.0 2 12 22.20 0 2
2 536367 2010-12-01 08:34:00 13047.0 12 83 278.73 0 1
3 536368 2010-12-01 08:34:00 13047.0 4 15 70.05 0 2
4 536369 2010-12-01 08:35:00 13047.0 1 3 17.85 0 3

Products

1. Get product data from transaction items

products_df = products.get_products(transaction_items)
products_df.head()
sku first_order_date last_order_date customers orders items revenue avg_unit_price avg_quantity avg_revenue avg_orders product_tenure product_recency
0 10002 2010-12-01 08:45:00 2011-04-28 15:05:00 40 73 1037 759.89 1.056849 14.205479 10.409452 1.82 3749 3600
1 10080 2011-02-27 13:47:00 2011-11-21 17:04:00 19 24 495 119.09 0.376667 20.625000 4.962083 1.26 3660 3393
2 10120 2010-12-03 11:19:00 2011-12-04 13:15:00 25 29 193 40.53 0.210000 6.433333 1.351000 1.16 3746 3380
3 10123C 2010-12-03 11:19:00 2011-07-15 15:05:00 3 4 -13 3.25 0.487500 -3.250000 0.812500 1.33 3746 3522
4 10123G 2011-04-08 11:13:00 2011-04-08 11:13:00 0 1 -38 0.00 0.000000 -38.000000 0.000000 inf 3620 3620

2. Calculate product consumption and repurchase rate

repurchase_rates = products.get_repurchase_rates(transaction_items)
repurchase_rates.head(3).T
0 1 2
sku 10002 10080 10120
revenue 759.89 119.09 40.53
items 1037 495 193
orders 73 24 29
customers 40 19 25
avg_unit_price 1.05685 0.376667 0.21
avg_line_price 10.4095 4.96208 1.351
avg_items_per_order 14.2055 20.625 6.65517
avg_items_per_customer 25.925 26.0526 7.72
purchased_individually 0 0 9
purchased_once 34 17 22
bulk_purchases 73 24 20
bulk_purchase_rate 1 1 0.689655
repurchases 39 7 7
repurchase_rate 0.534247 0.291667 0.241379
repurchase_rate_label Moderate repurchase Low repurchase Low repurchase
bulk_purchase_rate_label Very high bulk Very high bulk High bulk
bulk_and_repurchase_label Moderate repurchase_Very high bulk Low repurchase_Very high bulk Low repurchase_High bulk

Customers

1. Create a customers dataset

from ecommercetools import customers

customers_df = customers.get_customers(transaction_items)
customers_df.head()
customer_id revenue orders skus items first_order_date last_order_date avg_items avg_order_value tenure recency cohort
0 12346.0 0.00 2 1 0 2011-01-18 10:01:00 2011-01-18 10:17:00 0.00 0.00 3701 3700 20111
1 12347.0 4310.00 7 7 2458 2010-12-07 14:57:00 2011-12-07 15:52:00 351.14 615.71 3742 3377 20104
2 12348.0 1797.24 4 4 2341 2010-12-16 19:09:00 2011-09-25 13:13:00 585.25 449.31 3733 3450 20104
3 12349.0 1757.55 1 1 631 2011-11-21 09:51:00 2011-11-21 09:51:00 631.00 1757.55 3394 3394 20114
4 12350.0 334.40 1 1 197 2011-02-02 16:01:00 2011-02-02 16:01:00 197.00 334.40 3685 3685 20111

2. Create a customer cohort analysis dataset

from ecommercetools import customers

cohorts_df = customers.get_cohorts(transaction_items, period='M')
cohorts_df.head()
customer_id order_id order_date acquisition_cohort order_cohort
0 17850.0 536365 2010-12-01 08:26:00 2010-12 2010-12
7 17850.0 536366 2010-12-01 08:28:00 2010-12 2010-12
9 13047.0 536367 2010-12-01 08:34:00 2010-12 2010-12
21 13047.0 536368 2010-12-01 08:34:00 2010-12 2010-12
25 13047.0 536369 2010-12-01 08:35:00 2010-12 2010-12

3. Create a customer cohort analysis matrix

from ecommercetools import customers

cohort_matrix_df = customers.get_cohort_matrix(transaction_items, period='M', percentage=True)
cohort_matrix_df.head()
periods 0 1 2 3 4 5 6 7 8 9 10 11 12
acquisition_cohort
2010-12 1.0 0.381857 0.334388 0.387131 0.359705 0.396624 0.379747 0.354430 0.354430 0.394515 0.373418 0.500000 0.274262
2011-01 1.0 0.239905 0.282660 0.242280 0.327791 0.299287 0.261283 0.256532 0.311164 0.346793 0.368171 0.149644 NaN
2011-02 1.0 0.247368 0.192105 0.278947 0.268421 0.247368 0.255263 0.281579 0.257895 0.313158 0.092105 NaN NaN
2011-03 1.0 0.190909 0.254545 0.218182 0.231818 0.177273 0.263636 0.238636 0.288636 0.088636 NaN NaN NaN
2011-04 1.0 0.227425 0.220736 0.210702 0.207358 0.237458 0.230769 0.260870 0.083612 NaN NaN NaN NaN
from ecommercetools import customers

cohort_matrix_df = customers.get_cohort_matrix(transaction_items, period='M', percentage=False)
cohort_matrix_df.head()
periods 0 1 2 3 4 5 6 7 8 9 10 11 12
acquisition_cohort
2010-12 948.0 362.0 317.0 367.0 341.0 376.0 360.0 336.0 336.0 374.0 354.0 474.0 260.0
2011-01 421.0 101.0 119.0 102.0 138.0 126.0 110.0 108.0 131.0 146.0 155.0 63.0 NaN
2011-02 380.0 94.0 73.0 106.0 102.0 94.0 97.0 107.0 98.0 119.0 35.0 NaN NaN
2011-03 440.0 84.0 112.0 96.0 102.0 78.0 116.0 105.0 127.0 39.0 NaN NaN NaN
2011-04 299.0 68.0 66.0 63.0 62.0 71.0 69.0 78.0 25.0 NaN NaN NaN NaN

4. Create a customer "retention" dataset

from ecommercetools import customers

retention_df = customers.get_retention(transactions_df)
retention_df.head()
acquisition_cohort order_cohort customers periods
0 2010-12 2010-12 948 0
1 2010-12 2011-01 362 1
2 2010-12 2011-02 317 2
3 2010-12 2011-03 367 3
4 2010-12 2011-04 341 4

5. Create an RFM (H) dataset

This is an extension of the regular Recency, Frequency, Monetary value (RFM) model that includes an additional parameter "H" for heterogeneity. This shows the number of unique SKUs purchased by each customer. While typically unassociated with targeting, this value can be very useful in identifying which customers should probably be buying a broader mix of products than they currently are, as well as spotting those who may have stopped buying certain items.

from ecommercetools import customers

rfm_df = customers.get_rfm_segments(customers_df)
rfm_df.head()
customer_id acquisition_date recency_date recency frequency monetary heterogeneity tenure r f m h rfm rfm_score rfm_segment_name
0 12346.0 2011-01-18 10:01:00 2011-01-18 10:17:00 3700 2 0.00 1 3701 1 1 1 1 111 3 Risky
1 12350.0 2011-02-02 16:01:00 2011-02-02 16:01:00 3685 1 334.40 1 3685 1 1 1 1 111 3 Risky
2 12365.0 2011-02-21 13:51:00 2011-02-21 14:04:00 3666 3 320.69 2 3666 1 1 1 1 111 3 Risky
3 12373.0 2011-02-01 13:10:00 2011-02-01 13:10:00 3686 1 364.60 1 3686 1 1 1 1 111 3 Risky
4 12377.0 2010-12-20 09:37:00 2011-01-28 15:45:00 3690 2 1628.12 2 3730 1 1 1 1 111 3 Risky

6. Create a purchase latency dataset

from ecommercetools import customers 

latency_df = customers.get_latency(transactions_df)
latency_df.head()
customer_id frequency recency_date recency avg_latency min_latency max_latency std_latency cv days_to_next_order label
0 12680.0 4 2011-12-09 12:50:00 3388 28 16 73 30.859898 1.102139 -3329.0 Order overdue
1 13113.0 24 2011-12-09 12:49:00 3388 15 0 52 12.060126 0.804008 -3361.0 Order overdue
2 15804.0 13 2011-12-09 12:31:00 3388 15 1 39 11.008261 0.733884 -3362.0 Order overdue
3 13777.0 33 2011-12-09 12:25:00 3388 11 0 48 12.055274 1.095934 -3365.0 Order overdue
4 17581.0 25 2011-12-09 12:21:00 3388 14 0 67 21.974293 1.569592 -3352.0 Order overdue

7. Customer ABC segmentation

from ecommercetools import customers

abc_df = customers.get_abc_segments(customers_df, months=12, abc_class_name='abc_class_12m', abc_rank_name='abc_rank_12m')
abc_df.head()
customer_id abc_class_12m abc_rank_12m
0 12346.0 D 1.0
1 12347.0 D 1.0
2 12348.0 D 1.0
3 12349.0 D 1.0
4 12350.0 D 1.0

8. Predict customer AOV, CLV, and orders

EcommerceTools allows you to predict the AOV, Customer Lifetime Value (CLV) and expected number of orders via the Gamma-Gamma and BG/NBD models from the excellent Lifetimes package. By passing the dataframe of transactions from get_transactions() to the get_customer_predictions() function, EcommerceTools will fit the BG/NBD and Gamma-Gamma models and predict the AOV, order quantity, and CLV for each customer in the defined number of future days after the end of the observation period.

customer_predictions = customers.get_customer_predictions(transactions_df, 
                                                          observation_period_end='2011-12-09', 
                                                          days=90)
customer_predictions.head(10)
customer_id predicted_purchases aov clv
0 12346.0 0.188830 NaN NaN
1 12347.0 1.408736 569.978836 836.846896
2 12348.0 0.805907 333.784235 308.247354
3 12349.0 0.855607 NaN NaN
4 12350.0 0.196304 NaN NaN
5 12352.0 1.682277 376.175359 647.826169
6 12353.0 0.272541 NaN NaN
7 12354.0 0.247183 NaN NaN
8 12355.0 0.262909 NaN NaN
9 12356.0 0.645368 324.039419 256.855226

Advertising

1. Create paid search keywords

from ecommercetools import advertising

product_names = ['fly rods', 'fly reels']
keywords_prepend = ['buy', 'best', 'cheap', 'reduced']
keywords_append = ['for sale', 'price', 'promotion', 'promo', 'coupon', 'voucher', 'shop', 'suppliers']
campaign_name = 'fly_fishing'

keywords = advertising.generate_ad_keywords(product_names, keywords_prepend, keywords_append, campaign_name)
keywords.head()
product keywords match_type campaign_name
0 fly rods [fly rods] Exact fly_fishing
1 fly rods [buy fly rods] Exact fly_fishing
2 fly rods [best fly rods] Exact fly_fishing
3 fly rods [cheap fly rods] Exact fly_fishing
4 fly rods [reduced fly rods] Exact fly_fishing

2. Create paid search ad copy using Spintax

from ecommercetools import advertising

text = "Fly Reels from {Orvis|Loop|Sage|Airflo|Nautilus} for {trout|salmon|grayling|pike}"
spin = advertising.generate_spintax(text, single=False)

spin
['Fly Reels from Orvis for trout',
 'Fly Reels from Orvis for salmon',
 'Fly Reels from Orvis for grayling',
 'Fly Reels from Orvis for pike',
 'Fly Reels from Loop for trout',
 'Fly Reels from Loop for salmon',
 'Fly Reels from Loop for grayling',
 'Fly Reels from Loop for pike',
 'Fly Reels from Sage for trout',
 'Fly Reels from Sage for salmon',
 'Fly Reels from Sage for grayling',
 'Fly Reels from Sage for pike',
 'Fly Reels from Airflo for trout',
 'Fly Reels from Airflo for salmon',
 'Fly Reels from Airflo for grayling',
 'Fly Reels from Airflo for pike',
 'Fly Reels from Nautilus for trout',
 'Fly Reels from Nautilus for salmon',
 'Fly Reels from Nautilus for grayling',
 'Fly Reels from Nautilus for pike']

Operations

1. Create an ABC inventory classification

inventory_classification = operations.get_inventory_classification(transaction_items)
inventory_classification.head()
sku abc_class abc_rank
0 10002 A 1
1 10080 A 2
2 10120 A 3
3 10123C A 4
4 10123G A 4

Marketing

1. Get ecommerce trading calendar

from ecommercetools import marketing

trading_calendar_df = marketing.get_trading_calendar('2021-01-01', days=365)
trading_calendar_df.head()
date event
0 2021-01-01 January sale
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05

2. Get ecommerce trading events

from ecommercetools import marketing

trading_events_df = marketing.get_trading_events('2021-01-01', days=365)
trading_events_df.head()
date event
0 2021-01-01 January sale
1 2021-01-29 January Pay Day
2 2021-02-11 Valentine's Day [last order date]
3 2021-02-14 Valentine's Day
4 2021-02-26 February Pay Day

NLP

1. Generate text summaries

The get_summaries() function of the nlp module takes a Pandas dataframe containing text and returns a machine-generated summary of the content using a Huggingface Transformers pipeline via PyTorch. To use this feature, first load your Pandas dataframe and import the nlp module from ecommercetools.

import pandas as pd
from ecommercetools import nlp 

pd.set_option('max_colwidth', 1000)
df = pd.read_csv('text.csv')
df.head()

Specify the name of the Pandas dataframe, the column containing the text you wish to summarise (i.e. product_description), and specify a column name in which to store the machine-generated summary. The min_length and max_length arguments control the number of words generated, while the do_sample argument controls whether the generated text is completely unique (do_sample=False) or extracted from the text (do_sample=True).

df = nlp.get_summaries(df, 'product_description', 'sampled_summary', min_length=50, max_length=100, do_sample=True)
df = nlp.get_summaries(df, 'product_description', 'unsampled_summary', min_length=50, max_length=100, do_sample=False)
df = nlp.get_summaries(df, 'product_description', 'unsampled_summary_20_to_30', min_length=20, max_length=30, do_sample=False)

Since the model used for text summarisation is very large (1.2 GB plus), this function will take some time to complete. Once loaded, summaries are generated within a second or two per piece of text, so it is advisable to try smaller volumes of data initially.

SEO

1. Discover XML sitemap locations

The get_sitemaps() function takes the location of a robots.txt file (always stored at the root of a domain), and returns the URLs of any XML sitemaps listed within.

from ecommercetools import seo

sitemaps = seo.get_sitemaps("http://www.flyandlure.org/robots.txt")
print(sitemaps)

2. Get an XML sitemap

The get_dataframe() function allows you to download the URLs in an XML sitemap to a Pandas dataframe. If the sitemap contains child sitemaps, each of these will be retrieved. You can save the Pandas dataframe to CSV in the usual way.

from ecommercetools import seo

df = seo.get_sitemap("http://flyandlure.org/sitemap.xml")
print(df.head())
loc changefreq priority domain sitemap_name
0 http://flyandlure.org/ hourly 1.0 flyandlure.org http://www.flyandlure.org/sitemap.xml
1 http://flyandlure.org/about monthly 1.0 flyandlure.org http://www.flyandlure.org/sitemap.xml
2 http://flyandlure.org/terms monthly 1.0 flyandlure.org http://www.flyandlure.org/sitemap.xml
3 http://flyandlure.org/privacy monthly 1.0 flyandlure.org http://www.flyandlure.org/sitemap.xml
4 http://flyandlure.org/copyright monthly 1.0 flyandlure.org http://www.flyandlure.org/sitemap.xml

3. Get Core Web Vitals from PageSpeed Insights

The get_core_web_vitals() function retrieves the Core Web Vitals metrics for a list of sites from the Google PageSpeed Insights API and returns results in a Pandas dataframe. The function requires a a Google PageSpeed Insights API key.

from ecommercetools import seo

pagespeed_insights_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
urls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer']
df = seo.get_core_web_vitals(pagespeed_insights_key, urls)
print(df.head())

4. Get Google Knowledge Graph data

The get_knowledge_graph() function returns the Google Knowledge Graph data for a given search term. This requires the use of a Google Knowledge Graph API key. By default, the function returns output in a Pandas dataframe, but you can pass the output="json" argument if you wish to receive the JSON data back.

from ecommercetools import seo

knowledge_graph_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
knowledge_graph = seo.get_knowledge_graph(knowledge_graph_key, "tesla", output="dataframe")
print(knowledge_graph)

5. Get Google Search Console API data

The query_google_search_console() function runs a search query on the Google Search Console API and returns data in a Pandas dataframe. This function requires a JSON client secrets key with access to the Google Search Console API.

from ecommercetools import seo

key = "google-search-console.json"
site_url = "http://flyandlure.org"
payload = {
    'startDate': "2019-01-01",
    'endDate': "2019-12-31",
    'dimensions': ["page", "device", "query"],
    'rowLimit': 100,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
print(df.head())
page device query clicks impressions ctr position
0 http://flyandlure.org/articles/fly_fishing_gea... MOBILE simms freestone waders review 56 217 25.81 3.12
1 http://flyandlure.org/ MOBILE fly and lure 37 159 23.27 3.81
2 http://flyandlure.org/articles/fly_fishing_gea... DESKTOP orvis encounter waders review 35 134 26.12 4.04
3 http://flyandlure.org/articles/fly_fishing_gea... DESKTOP simms freestone waders review 35 200 17.50 3.50
4 http://flyandlure.org/ DESKTOP fly and lure 32 170 18.82 3.09
Fetching all results from Google Search Console

To fetch all results, set fetch_all to True. This will automatically paginate through your Google Search Console data and return all results. Be aware that if you do this you may hit Google's quota limit if you run a query over an extended period, or have a busy site with lots of page or query dimensions.

from ecommercetools import seo

key = "google-search-console.json"
site_url = "http://flyandlure.org"
payload = {
    'startDate': "2019-01-01",
    'endDate': "2019-12-31",
    'dimensions': ["page", "device", "query"],
    'rowLimit': 25000,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload, fetch_all=True)
print(df.head())
Comparing two time periods in Google Search Console
payload_before = {
    'startDate': "2021-08-11",
    'endDate': "2021-08-31",
    'dimensions': ["page","query"],    
}

payload_after = {
    'startDate': "2021-07-21",
    'endDate': "2021-08-10",
    'dimensions': ["page","query"],    
}

df = seo.query_google_search_console_compare(key, site_url, payload_before, payload_after, fetch_all=False)
df.sort_values(by='clicks_change', ascending=False).head()

6. Get the number of "indexed" pages

The get_indexed_pages() function uses the "site:" prefix to search Google for the number of pages "indexed". This is very approximate and may not be a perfect representation, but it's usually a good guide of site "size" in the absence of other data.

from ecommercetools import seo

urls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer', 'http://flyandlure.org']
df = seo.get_indexed_pages(urls)
print(df.head())
url indexed_pages
2 http://flyandlure.org 2090
1 https://www.bbc.co.uk/iplayer 215000
0 https://www.bbc.co.uk 12700000

7. Get keyword suggestions from Google Autocomplete

The google_autocomplete() function returns a set of keyword suggestions from Google Autocomplete. The include_expanded=True argument allows you to expand the number of suggestions shown by appending prefixes and suffixes to the search terms.

from ecommercetools import seo

suggestions = seo.google_autocomplete("data science", include_expanded=False)
print(suggestions)

suggestions = seo.google_autocomplete("data science", include_expanded=True)
print(suggestions)
term relevance
0 data science jobs 650
1 data science jobs chester 601
2 data science course 600
3 data science masters 554
4 data science salary 553
5 data science internship 552
6 data science jobs london 551
7 data science graduate scheme 550

8. Retrieve robots.txt content

The get_robots() function returns the contents of a robots.txt file in a Pandas dataframe so it can be parsed and analysed.

from ecommercetools import seo

robots = seo.get_robots("http://www.flyandlure.org/robots.txt")
print(robots)
directive parameter
0 User-agent *
1 Disallow /signin
2 Disallow /signup
3 Disallow /users
4 Disallow /contact
5 Disallow /activate
6 Disallow /*/page
7 Disallow /articles/search
8 Disallow /search.php
9 Disallow *q=*
10 Disallow *category_slug=*
11 Disallow *country_slug=*
12 Disallow *county_slug=*
13 Disallow *features=*

9. Get Google SERPs

The get_serps() function returns a Pandas dataframe containing the Google search engine results for a given search term. Note that this function is not suitable for large-scale scraping and currently includes no features to prevent it from being blocked.

from ecommercetools import seo

serps = seo.get_serps("data science blog")
print(serps)
title link text
0 10 of the best data science blogs to follow - ... https://www.tableau.com/learn/articles/data-sc... 10 of the best data science blogs to follow. T...
1 Best Data Science Blogs to Follow in 2020 | by... https://towardsdatascience.com/best-data-scien... 14 Jul 2020 — 1. Towards Data Science · Joined...
2 Top 20 Data Science Blogs And Websites For Dat... https://medium.com/@exastax/top-20-data-scienc... Top 20 Data Science Blogs And Websites For Dat...
3 Data Science Blog – Dataquest https://www.dataquest.io/blog/ Browse our data science blog to get helpful ti...
4 51 Awesome Data Science Blogs You Need To Chec... https://365datascience.com/trending/51-data-sc... Blog name: DataKind · datakind data science bl...
5 Blogs on AI, Analytics, Data Science, Machine ... https://www.kdnuggets.com/websites/blogs.html Individual/small group blogs · Ai4 blog, featu...
6 Data Science Blog – Applied Data Science https://data-science-blog.com/ ... an Bedeutung – DevOps for Data Science. De...
7 Top 10 Data Science and AI Blogs in 2020 - Liv... https://livecodestream.dev/post/top-data-scien... Some of the best data science and AI blogs for...
8 Data Science Blogs: 17 Must-Read Blogs for Dat... https://www.thinkful.com/blog/data-science-blogs/ Data scientists could be considered the magici...
9 rushter/data-science-blogs: A curated list of ... https://github.com/rushter/data-science-blogs A curated list of data science blogs. Contribu...

Create an ABCD classification of Google Search Console data

The classify_pages() function returns an ABCD classification of Google Search Console data. This calculates the cumulative sum of clicks and then categorises pages using the ABC algorithm (the first 80% are classed A, the next 10% are classed B, and the final 10% are classed C, with the zero click pages classed D).

from ecommercetools import seo

key = "client_secrets.json"
site_url = "example-domain.co.uk"
start_date = '2022-10-01'
end_date = '2022-10-31'

df_classes = seo.classify_pages(key, site_url, start_date, end_date, output='classes')
print(df_classes.head())

df_summary = seo.classify_pages(key, site_url, start_date, end_date, output='summary')
print(df_summary)
                                                page  clicks  impressions    ctr  position  clicks_cumsum  clicks_running_pc  pc_share class  class_rank
0  https://practicaldatascience.co.uk/machine-lea...    3890        36577  10.64     12.64           3890           8.382898  8.382898     A           1
1  https://practicaldatascience.co.uk/data-scienc...    2414        16618  14.53     14.30           6304          13.585036  5.202138     A           2
2  https://practicaldatascience.co.uk/data-scienc...    2378        71496   3.33     16.39           8682          18.709594  5.124558     A           3
3  https://practicaldatascience.co.uk/data-scienc...    1942        14274  13.61     15.02          10624          22.894578  4.184984     A           4
4  https://practicaldatascience.co.uk/data-scienc...    1738        23979   7.25     11.80          12362          26.639945  3.745367     A           5

class  pages  impressions  clicks   avg_ctr  avg_position  share_of_clicks  share_of_impressions
0     A     63       747643   36980  5.126349     22.706825             79.7                  43.7
1     B     46       639329    4726  3.228043     31.897826             10.2                  37.4
2     C    190       323385    4698  2.393632     38.259368             10.1                  18.9
3     D     36         1327       0  0.000000     25.804722              0.0                   0.1

Reports

The Reports module creates weekly, monthly, quarterly, or yearly reports for customers and orders and calculates a range of common ecommerce metrics to show business performance.

1. Customers report

The customers_report() function takes a formatted dataframe of transaction items (see above) and a desired frequency (D for daily, W for weekly, M for monthly, Q for quarterly) and calculates aggregate metrics for each period.

The function returns the number of orders, the number of customers, the number of new customers, the number of returning customers, and the acquisition rate (or proportion of new customers). For monthly reporting, I would recommend a 13-month period so you can compare the last month with the same month the previous year.

from ecommercetools import reports

df_customers_report = reports.customers_report(transaction_items, frequency='M')
print(df_customers_report.head(13))

2. Transactions report

The transactions_report() function takes a formatted dataframe of transaction items (see above) and a desired frequency (D for daily, W for weekly, M for monthly, Q for quarterly) and calculates aggregate metrics for each period.

The metrics returned are: customers, orders, revenue, SKUs, units, average order value, average SKUs per order, average units per order, and average revenue per customer.

from ecommercetools import reports

df_orders_report = reports.transactions_report(transaction_items, frequency='M')
print(df_orders_report.head(13))