serpapi / google-search-results-python

Google Search Results via SERP API pip Python Package
MIT License
600 stars 97 forks source link

How to get "related articles" links from google scholar via serpapi? #34

Closed monk1337 closed 2 years ago

monk1337 commented 2 years ago

I am using SERP API to fetch google scholar papers, although there is always a link called "related articles' under each article but SERP API doesn't have any SERP URL to fetch data of those links?

Screenshot 2022-07-14 at 3 05 07 AM

Serp API result :

Screenshot 2022-07-14 at 3 15 16 AM

Can I directly call this URL https://scholar.google.com/scholar?q=related:gemrYG-1WnEJ:scholar.google.com/&scioq=Multi-label+text+classification+with+latent+word-wise+label+information&hl=en&as_sdt=0,21 using serp API?

dimitryzub commented 2 years ago

Hi, @monk1337 👋

Can I directly call this URL https://scholar.google.com/scholar?q=related:gemrYG-1WnEJ:scholar.google.com/&scioq=Multi-label+text+classification+with+latent+word-wise+label+information&hl=en&as_sdt=0,21 using serp API?

  1. Yes you can. You just need to pass the q= URL value to SerpApi q search parameter. In the case of the URL provided by you, SerpApi q parameter would be: related:gemrYG-1WnEJ:scholar.google.com: image

  2. You can retrieve data directly from URL using only CURL and JQ. We have a #AskSerpApi episode that covers specifically this question: #AskSerpApi: "How to extract a specific element from the JSON URL?" | CURL + JQ.

To extract related articles, you need to access ["organic_results"]["inline_links"]["related_pages_link"]:

image

Example code to extract related articles from the first page:

from serpapi import GoogleSearch

params = {
  "api_key": "...",
  "engine": "google_scholar",
  "q": "Coffee",
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
    related_articles = result["inline_links"]["related_pages_link"]
    print(related_articles)

Outputs (where q= is a related articles search query that can be passed to SerpApi q search parameter):

https://scholar.google.com/scholar?q=related:sWzmct-yYzgJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:9WouRiFbIK4J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:fGeQlvu-2_IJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:-0fOFoq7wJ8J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:CZSAb_VNDkkJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:Jt15QwxlEw0J:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:31GOrHWBl_AJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:KVT-hW9IrDoJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:Ang0MOfBmAUJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11
https://scholar.google.com/scholar?q=related:QwF9cuvhnCoJ:scholar.google.com/&scioq=Coffee&hl=en&as_sdt=0,11

To access related articles with SerpApi, you can do it like so (keep in mind that this is an example):

from serpapi import GoogleSearch
import re

def get_related_articles_query():
    params = {
      "api_key": "...",
      "engine": "google_scholar",
      "q": "Multi-label text classification",
      "hl": "en"
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    related_articles = []

    for result in results["organic_results"]:
        # https://regex101.com/r/XuEhoh/1
        related_article = re.search(r"q=(.*)\/&scioq", result["inline_links"]["related_pages_link"]).group(1)
        related_articles.append(related_article)

    return related_articles

def get_related_articles_results():
    for related_article in get_related_articles_query():
        params = {
          "api_key": "...",
          "engine": "google_scholar",
          "q": related_article, # related:sWzmct-yYzgJ:scholar.google.com ...
          "hl": "en"
        }

        search = GoogleSearch(params)
        results = search.get_dict()

        for result in results["organic_results"]:
            print(result.get("title"), result.get("link"), result.get("publication_info", {}).get("summary"), sep="\n")

get_related_articles_results()

Outputs:

Deep learning for extreme multi-label text classification
https://dl.acm.org/doi/abs/10.1145/3077136.3080834
J Liu, WC Chang, Y Wu, Y Yang - … of the 40th international ACM SIGIR …, 2017 - dl.acm.org
...

Code example in the online IDE: https://replit.com/@DimitryZub1/Google-Scholar-SerpApi-API-Extract-Related-Articles

Let me know if it makes sense and if you need additional clarifications 🌼

dimitryzub commented 2 years ago

@monk1337 We've added a newserpapi_related_pages_link dict key from JSON response:

image

So now there's no need to use regex to extract search query:

re.search(r"q=(.*)\/&scioq", result["inline_links"]["related_pages_link"]).group(1)

Let me know if you need any additional help 🙂

dimitryzub commented 2 years ago

Closing this as we implemented it. For more: https://serpapi.com/google-scholar-api