Closed marm123 closed 8 months ago
Might be related to https://github.com/serpapi/public-roadmap/issues/300
It appears that Google changed requests for its Trends page, making some Python libraries, like pytrends, unreliable. I'm unsure if this is relevant or related to this issue, but it's worth mentioning. Link to the Intercom thread and reported issue on pytrends GitHub repository are below:
This post from pytrends issue suggests that we may have been affected as well:
I opened an issue already for this a few weeks ago. After doing some digging, it seems Google has changed their API and is now creating "holes" in the data for scraped info.
It is also happening on large keyword tools such as Keywords Everywhere
There is now a new user in the headers, one called '
USER_TYPE_LEGIT_USER
' and the other 'USER_TYPE_SCRAPER
' The scraper user has the "holes" while the legit user doesn't.
Im the poster in the PyTrends issue ^
Please let me know if you're able to find a resolution to this.
We are marked with USER_TYPE_SCRAPER
:
HTTP Error 401 Unauthorized indicates that the request lacks valid authentication credentials for the target resource.
You have to get the 'USER_TYPE_LEGIT_USER' token. Its not just replacing the userConfig
Im not sure how to do that without borrowing it from the browser
I am curious if Google employs this technique with their other services.
Related question on StackOverflow: https://stackoverflow.com/q/73988220/1291371
Related issues in g-trends
repository: https://github.com/x-fran/g-trends/issues/54
Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get USER_TYPE_SCRAPER
instead of USER_TYPE_LEGIT_USER
.
curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
-H 'authority: trends.google.com' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-language: en-US,en;q=0.6' \
-H 'content-type: application/json;charset=UTF-8' \
-H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \
-H 'origin: https://trends.google.com' \
-H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'sec-gpc: 1' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
--data-raw '' \
--compressed
What if we just hardcode cookie for the USER_TYPE_LEGIT_USER
?
Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get
USER_TYPE_SCRAPER
instead ofUSER_TYPE_LEGIT_USER
.curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \ -H 'authority: trends.google.com' \ -H 'accept: application/json, text/plain, */*' \ -H 'accept-language: en-US,en;q=0.6' \ -H 'content-type: application/json;charset=UTF-8' \ -H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \ -H 'origin: https://trends.google.com' \ -H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-gpc: 1' \ -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \ --data-raw '' \ --compressed
What if we just hardcode cookie for the
USER_TYPE_LEGIT_USER
?
I tried doing this. But it will still return USER_TYPE_SCRAPER
after a few requests.
The request for cookie is expected to be a POST request now.
With the regular cURL, it's still returns USER_TYPE_SCRAPER
. But with curl-impersonate
, Google Trends responds with USER_TYPE_LEGIT_USER
.
Command:
curl_ff98 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22snowboard%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
-H 'authority: trends.google.com' \
-H 'accept: application/json, text/plain, */*' \
-H 'content-type: application/json;charset=UTF-8' \
-H 'cookie: NID=511=FJ4YkcBxxIqov2FykB9Bk59PRArkpNvtsUNt9YnMMQMjZ8_IVOILVqRP0CTaQbHav5UZ0XTeCbDpK8PA9niYtdiPlP8eNcB5pej0fp9gJq99jfFvzlB_dV74utZN-V2X_riUioiBhfwPdz16HbtA2Soxiu10lHPGdNlE__BYgoI' \
-H 'origin: https://trends.google.com' \
-H 'referer: https://trends.google.com/trends/explore?q=snowboard' \
--data-raw '' \
--compressed
@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.
With the regular cURL, it's still returns
USER_TYPE_SCRAPER
. But withcurl-impersonate
, Google Trends responds withUSER_TYPE_LEGIT_USER
.
@ilyazub I get "userType":"USER_TYPE_LEGIT_USER"
with regular curl
curl --version
curl 7.84.0 (x86_64-apple-darwin22.0) libcurl/7.84.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.11 nghttp2/1.47.0
Release-Date: 2022-06-27
I think we can reuse data
in POST request to get USER_TYPE_LEGIT_USER
, but on subsequent request with the same data
you get USER_TYPE_SCRAPER
. But if you wait a bit between requests with the same data
then you get USER_TYPE_LEGIT_USER
.
data
FEWyJzZ...
looks like a Recaptcha token.
data
FEWyJzZ...
looks like a Recaptcha token.
Yes, it's the Invisible reCAPTCHA token.
@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.
@jbnitorum We will fix it but don't have a timeline for the fix. Thank you for your patience and understanding.
Hi @ilyazub , thanks a lot for looking into this. I am wondering if there is any update on this topic. Thanks.
Update: Please ignore. I realized that there is a more recent update here
The data from Google Trends and SerpApi seem to match.
Any suggestions on how to fix the problem on the code?
@Helldez I'm sorry to hear that you faced an issue with our Google Trends API. May you please share a search ID or the parameters you used that caused the problem?
I'm not a serpapi user I'm helping rewrite pytrends. So I wanted to ask you how you solved the recaptcha problem if possible otherwise thanks anyway
This should be fixed after https://github.com/serpapi/public-roadmap/issues/1143 as we are getting USER_TYPE_LEGIT_USER
for our requests.
It appears that some searches are missing the whole 2022 data. I noticed that Google indicates, on their Trends search, that they change the data collection. I'm not sure if this is relevant, as some searches do return data from 2022, while others do not.
Search with missing 2022 data
Information from Google Trends search
Playground | Inspect