serpapi / google-search-results-php

Google Search Results PHP API via Serp Api
https://serpapi.com
MIT License
54 stars 41 forks source link

Access to Google Scholar BiBTeX data #6

Open Napoleon-BlownApart opened 3 years ago

Napoleon-BlownApart commented 3 years ago

Thank you for this API!! It is good to see that at last someone is working on this officially. I've tried scraping Google Scholar via some code I wrote using Swift and whilst I was able to parse each entry, my code was temperamental because of Google's anti-robot mechanisms. Because I was using Swift's internal http client, I had issues with some cookie information that caused Google Scholar to invoke reCaptcha during my runs.

Whilst this issue pertains to extending the API to enable access to the BiBTeX reference information (and other formats for that matter), I would also like to point out that (often) the reference data on Google Scholar is not as accurate as it could be. For example, using the playground to find "M-Coffee: combining multiple sequence alignment methods with T-Coffee" and comparing the returned data with Google Scholar's BiBTeX information as well as Oxford's Academic Database, the discrepancies will be clear:

Google Scholar's BiBTeX data for this paper (No DOI information):

@article{wallace2006m,
  title={M-Coffee: combining multiple sequence alignment methods with T-Coffee},
  author={Wallace, Iain M and O'sullivan, Orla and Higgins, Desmond G and Notredame, Cedric},
  journal={Nucleic acids research},
  volume={34},
  number={6},
  pages={1692--1699},
  year={2006},
  publisher={Oxford University Press}
}

Oxford Academic's data on the paper: https://academic.oup.com/nar/article/34/6/1692/2401531

Data returned by this API (Only three of the four authors mentioned):

{
    "search_metadata":
    {
        "id": "5f4bc5a9002fca2990aa41c2",
        "status": "Success",
        "json_endpoint": "https://serpapi.com/searches/48162f4a4112bb86/5f4bc5a9002fca2990aa41c2.json",
        "created_at": "2020-08-30 15:28:41 UTC",
        "processed_at": "2020-08-30 15:28:41 UTC",
        "google_scholar_url": "https://scholar.google.com/scholar?q=M-Coffee%3A+combining+multiple+sequence+alignment+methods+with+T-Coffee&hl=en",
        "raw_html_file": "https://serpapi.com/searches/48162f4a4112bb86/5f4bc5a9002fca2990aa41c2.html",
        "total_time_taken": 0.82
    }
    ,
    "search_parameters":
    {
        "engine": "google_scholar",
        "q": "M-Coffee: combining multiple sequence alignment methods with T-Coffee",
        "hl": "en"
    }
    ,
    "search_information":
    {
        "organic_results_state": "Results for exact spelling",
        "query_displayed": "M-Coffee: combining multiple sequence alignment methods with T-Coffee"
    }
    ,
    "organic_results":
    [
        "0":
        {
            "position": 0,
            "title": "M-Coffee: combining multiple sequence alignment methods with T-Coffee",
            "result_id": "_3o-xhuGyg0J",
            "link": "https://academic.oup.com/nar/article-abstract/34/6/1692/2401531",
            "snippet": "We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods …",
            "publication_info":
            {
                "summary": "IM Wallace, O O'sullivan, DG Higgins… - Nucleic acids …, 2006 - academic.oup.com",
                "authors":
                [
                    "0":
                    {
                        "name": "IM Wallace",
                        "link": "https://scholar.google.com/citations?user=oYUWc7YAAAAJ&hl=en&oi=sra"
                    }
                    ,
                    "1":
                    {
                        "name": "O O'sullivan",
                        "link": "https://scholar.google.com/citations?user=rYniXB8AAAAJ&hl=en&oi=sra"
                    }
                    ,
                    "2":
                    {
                        "name": "DG Higgins",
                        "link": "https://scholar.google.com/citations?user=Ap0K7rUAAAAJ&hl=en&oi=sra"
                    }
                ]
            }
            ,
            "resources":
            [
                "0":
                {
                    "title": "oup.com",
                    "file_format": "HTML",
                    "link": "https://academic.oup.com/nar/article/34/6/1692/2401531"
                }
            ]
            ,
            "inline_links":
            {
                "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=_3o-xhuGyg0J",
                "html_version": "https://academic.oup.com/nar/article/34/6/1692/2401531",
                "cited_by":
                {
                    "total": 508,
                    "link": "https://scholar.google.com/scholar?cites=993754121636838143&as_sdt=5,44&sciodt=0,44&hl=en",
                    "serpapi_scholar_link": "https://serpapi.com/search.json?cites=993754121636838143&engine=google_scholar&hl=en&q=M-Coffee%3A+combining+multiple+sequence+alignment+methods+with+T-Coffee"
                }
                ,
                "related_pages_link": "https://scholar.google.com/scholar?q=related:_3o-xhuGyg0J:scholar.google.com/&scioq=M-Coffee:+combining+multiple+sequence+alignment+methods+with+T-Coffee&hl=en&as_sdt=0,44",
                "versions":
                {
                    "total": 18,
                    "link": "https://scholar.google.com/scholar?cluster=993754121636838143&hl=en&as_sdt=0,44",
                    "serpapi_scholar_link": "https://serpapi.com/search.json?cluster=993754121636838143&engine=google_scholar&hl=en&q=M-Coffee%3A+combining+multiple+sequence+alignment+methods+with+T-Coffee"
                }
            }
        }
    ]
}
jvmvik commented 2 years ago

@hartator do you know if the limit to 3 books returned by google scholar has been fixed ?