pat310 / google-trends-api

An API layer on top of google trends
https://www.npmjs.com/package/google-trends-api
MIT License
889 stars 176 forks source link

Data variation/mismatch? #114

Open junkdeck opened 6 years ago

junkdeck commented 6 years ago

The data returned by google-trends-api does not fully align with the same queries performed on Google Trends. Is this an inherent issue with scraping data from GTrends, or is the data modified in some way?

pat310 commented 5 years ago

The data isn't modified in any way, perhaps the url that this library is hitting is outdated now?

merkshroom commented 5 years ago

How are you using the api? If you are using a custom timespan, like the past 30 days, you have to set the starttime to 31 days earlier, as Google Trends measures the past 30 days as the 30 days before today, not including today.

junkdeck commented 5 years ago

I don't think that's it - the same dates have different data.

merkshroom commented 5 years ago

Can you post the code you're using?

sutefan1 commented 5 years ago

I think the data is different because the API doesn't let you specify on the sub term, like search term or topic or System Software.

As an example, if you type in Unity you can choose between several interpretations of the keyword. Is there any chance for the API to do the same thing?

thabblegit commented 5 years ago

I am having the same issue.

tried in API: googleTrendsApi.interestOverTime({keyword: 'marketing', geo:'US'})

return (last few rows show:):

{"time":"1525132800","formattedTime":"May 2018","formattedAxisTime":"May 1, 2018","value":[45],"hasData":[true],"formattedValue":["45"]},{"time":"1527811200","formattedTime":"Jun 2018","formattedAxisTime":"Jun 1, 2018","value":[42],"hasData":[true],"formattedValue":["42"]},{"time":"1530403200","formattedTime":"Jul 2018","formattedAxisTime":"Jul 1, 2018","value":[40],"hasData":[true],"formattedValue":["40"]},{"time":"1533081600","formattedTime":"Aug 2018","formattedAxisTime":"Aug 1, 2018","value":[42],"hasData":[true],"formattedValue":["42"]},{"time":"1535760000","formattedTime":"Sep 2018","formattedAxisTime":"Sep 1, 2018","value":[44],"hasData":[true],"formattedValue":["44"]},{"time":"1538352000","formattedTime":"Oct 2018","formattedAxisTime":"Oct 1, 2018","value":[45],"hasData":[true],"formattedValue":["45"]},{"time":"1541030400","formattedTime":"Nov 2018","formattedAxisTime":"Nov 1, 2018","value":[43],"hasData":[true],"formattedValue":["43"]},{"time":"1543622400","formattedTime":"Dec 2018","formattedAxisTime":"Dec 1, 2018","value":[36],"hasData":[true],"formattedValue":["36"]},{"time":"1546300800","formattedTime":"Jan 2019","formattedAxisTime":"Jan 1, 2019","value":[42],"hasData":[true],"formattedValue":["42"],"isPartial":true}],"averages":[]}}

All values are under 50.

However, manual input shows values much higher (one 70), see screen capture

screen shot 2019-01-19 at 9 10 54 am

Partial Embed of above: "exploreQuery":"geo=US&q=Marketing&date=today 12-m",

thabblegit commented 5 years ago

Figured out the issue. Google seems to calculate the data based on the period supplied. Default for API is 2004. If you supply a start date of today-12 months, the data will match.

pat310 commented 5 years ago

Thanks @thabblegit! Maybe there should be a comment about this in the README

guilherme-salome commented 5 years ago

I am facing the same issue. I collected the data by specifying two time intervals: 2007-01-01 2007-08-01 and 2007-07-01 2008-02-01. The two calls result in daily search volume data, with 1 month (July) of overlapping data. Comparing the data side by side, I get:

screen shot 2019-01-30 at 4 29 08 pm

The left-column comes from the data corresponding to the interval 2007-01-01 2007-08-01, while the second column comes from the data corresponding to the interval 2007-07-01 2008-02-01. They are queries for the same keyword (PROFITABLE BUSINESS), but yield very different values on the same day. I was expecting to find non-zeros at the same days, but there are days where one of the values is very positive and the other is zero, and vice-versa. This makes scaling the data a non-trivial issue.

Any ideas on how to deal with this?