serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
55 stars 5 forks source link

[Google Trends API] Data for searches from 2022 is missing #561

Closed marm123 closed 8 months ago

marm123 commented 1 year ago

It appears that some searches are missing the whole 2022 data. I noticed that Google indicates, on their Trends search, that they change the data collection. I'm not sure if this is relevant, as some searches do return data from 2022, while others do not.

Search with missing 2022 data

image

Information from Google Trends search

image

Playground | Inspect

marm123 commented 1 year ago

Might be related to https://github.com/serpapi/public-roadmap/issues/300

marm123 commented 1 year ago

It appears that Google changed requests for its Trends page, making some Python libraries, like pytrends, unreliable. I'm unsure if this is relevant or related to this issue, but it's worth mentioning. Link to the Intercom thread and reported issue on pytrends GitHub repository are below:

Intercom pytrends GitHub issue

aliayar commented 1 year ago

This post from pytrends issue suggests that we may have been affected as well:

I opened an issue already for this a few weeks ago. After doing some digging, it seems Google has changed their API and is now creating "holes" in the data for scraped info.

It is also happening on large keyword tools such as Keywords Everywhere

There is now a new user in the headers, one called 'USER_TYPE_LEGIT_USER' and the other 'USER_TYPE_SCRAPER' The scraper user has the "holes" while the legit user doesn't.

nicktba commented 1 year ago

Im the poster in the PyTrends issue ^

Please let me know if you're able to find a resolution to this.

aciddjus commented 1 year ago

We are marked with USER_TYPE_SCRAPER:

image
nicktba commented 1 year ago

HTTP Error 401 Unauthorized indicates that the request lacks valid authentication credentials for the target resource.

You have to get the 'USER_TYPE_LEGIT_USER' token. Its not just replacing the userConfig

Im not sure how to do that without borrowing it from the browser

aliayar commented 1 year ago

I am curious if Google employs this technique with their other services.

ilyazub commented 1 year ago

Related question on StackOverflow: https://stackoverflow.com/q/73988220/1291371

Related issues in g-trends repository: https://github.com/x-fran/g-trends/issues/54

ilyazub commented 1 year ago

Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get USER_TYPE_SCRAPER instead of USER_TYPE_LEGIT_USER.

image

curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en;q=0.6' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-gpc: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
  --data-raw '' \
  --compressed

What if we just hardcode cookie for the USER_TYPE_LEGIT_USER?

aciddjus commented 1 year ago

Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get USER_TYPE_SCRAPER instead of USER_TYPE_LEGIT_USER.

image

curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en;q=0.6' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-gpc: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
  --data-raw '' \
  --compressed

What if we just hardcode cookie for the USER_TYPE_LEGIT_USER?

I tried doing this. But it will still return USER_TYPE_SCRAPER after a few requests.

image
ilyazub commented 1 year ago

The request for cookie is expected to be a POST request now.

image

With the regular cURL, it's still returns USER_TYPE_SCRAPER. But with curl-impersonate, Google Trends responds with USER_TYPE_LEGIT_USER.

image

Command:

curl_ff98 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22snowboard%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=FJ4YkcBxxIqov2FykB9Bk59PRArkpNvtsUNt9YnMMQMjZ8_IVOILVqRP0CTaQbHav5UZ0XTeCbDpK8PA9niYtdiPlP8eNcB5pej0fp9gJq99jfFvzlB_dV74utZN-V2X_riUioiBhfwPdz16HbtA2Soxiu10lHPGdNlE__BYgoI' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?q=snowboard' \
  --data-raw '' \
  --compressed
jbnitorum commented 1 year ago

@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.

image

ritu1337 commented 1 year ago

With the regular cURL, it's still returns USER_TYPE_SCRAPER. But with curl-impersonate, Google Trends responds with USER_TYPE_LEGIT_USER.

@ilyazub I get "userType":"USER_TYPE_LEGIT_USER" with regular curl

curl --version
curl 7.84.0 (x86_64-apple-darwin22.0) libcurl/7.84.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.11 nghttp2/1.47.0
Release-Date: 2022-06-27

I think we can reuse data in POST request to get USER_TYPE_LEGIT_USER, but on subsequent request with the same data you get USER_TYPE_SCRAPER. But if you wait a bit between requests with the same data then you get USER_TYPE_LEGIT_USER.

data FEWyJzZ... looks like a Recaptcha token.

ilyazub commented 1 year ago

data FEWyJzZ... looks like a Recaptcha token.

Yes, it's the Invisible reCAPTCHA token.

@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.

@jbnitorum We will fix it but don't have a timeline for the fix. Thank you for your patience and understanding.

emptymalei commented 1 year ago

Hi @ilyazub , thanks a lot for looking into this. I am wondering if there is any update on this topic. Thanks.


Update: Please ignore. I realized that there is a more recent update here

887

ilyazub commented 1 year ago

The data from Google Trends and SerpApi seem to match.

Google Trends SerpApi
image image
https://trends.google.com/trends/explore?date=2017-01-01%202023-09-13&geo=DK&q=vink%C3%B8leskab&hl=en&tz=420 https://serpapi.com/playground?engine=google_trends&q=vink%C3%B8leskab&geo=DK&tz=420&date=2017-01-01+2023-09-13
Helldez commented 11 months ago

Any suggestions on how to fix the problem on the code?

ilyazub commented 11 months ago

@Helldez I'm sorry to hear that you faced an issue with our Google Trends API. May you please share a search ID or the parameters you used that caused the problem?

Helldez commented 11 months ago

I'm not a serpapi user I'm helping rewrite pytrends. So I wanted to ask you how you solved the recaptcha problem if possible otherwise thanks anyway

tanys123 commented 8 months ago

This should be fixed after https://github.com/serpapi/public-roadmap/issues/1143 as we are getting USER_TYPE_LEGIT_USER for our requests.

Screenshot 2024-02-28 at 3 44 25 PM