Open randomgambit opened 7 years ago
for instance, could you aggregate the counts at the hour-level instead of the daily level? That would help match the data more precisely with data coming from other timezones.
except all I'm doing is calling anytime::anytime(date_start)
(etc) and the result of that call is returning only day resolution. lemme look at the raw API values tho
20161212T000000Z 20161212T235959Z
are examples of the start/end times for the timeline
structure so you're out of luck there. but 20161221T050000Z
is what comes back for show_date
in the top_mactchs
structure and anytime
is not converting that properly so lemme see what i can do for at least that one.
thanks! if you dont find any workaround, then mailing the guy at GDELT can be a solution I guess
show_date
in top_matches
should have hms resolution now in 0.3.1 I just pushed. The others don't have such resolution.
dplyr::glimpse(df$top_matches)
## Observations: 1,000
## Variables: 8
## $ preview_url <chr> "https://archive.org/details/FBC_20161223_140000_Varney__Company#start/...
## $ ia_show_id <chr> "FBC_20161223_140000_Varney__Company", "CNNW_20161128_180000_Wolf", "FO...
## $ date <date> 2016-12-23, 2016-11-28, 2016-12-27, 2016-12-23, 2016-12-20, 2016-11-29...
## $ station <chr> "FOX Business", "CNN", "FOX News", "FOX Business", "FOX Business", "FOX...
## $ show <chr> "Varney Company", "Wolf", "FOX Friends", "Varney Company", "Making M...
## $ show_date <dttm> 2016-12-23 14:00:00, 2016-11-28 18:00:00, 2016-12-27 11:00:00, 2016-12...
## $ preview_thumb <chr> "https://archive.org/download/FBC_20161223_140000_Varney__Company/FBC_2...
## $ snippet <chr> "only at td ameritrade. the berlin terror suspect is debt. what else ha...
amazing! I am looking at your documentation and I am not sure what top_matches
returns for a given request. For instance, If I search for hrbrmstr
over 2015, what is then the output of top_matches? The days with the most counts?
That's a gd GDELT/Internet Archive TV search question. I'm assuming (from various testing) that is' the caption text from the top "n" (for large date ranges it maxes at 1K) out of all of the other possible ones it could return. You won't get more than that from the API tho.
thats great. Thanks again for your help. I ll play a bit with this for a while. But the raw data has to be somewhere, right?
It depends on what GDELT & IA put in their DB. You can clone the code and return the JSON before it gets processed and you'll see that the other structures don't have the resolution you want. Or go to the GUI web interface on their site generate CSVs and JSONs and validate there, too.
@hrbrmstr coincidence? http://blog.gdeltproject.org/television-explorer-hourly-timeline-boolean-or-and-increased-json-cap/
:D
but as you can see the data can only be downloaded over a 7 days period. It would be amazing if your package could take a date range as an input, break it down into slices of 7 days, download the data for each week and then combine everything into a tibble
.
That way would allow everyone to recover the full intraday history. What do you think? Is that doable on your side?
Thanks again!
+100 for the heads' up on their API changes. #ty!!!
Step 1 was making it work with the new API changes ;-) Longer results were causing errors in httr
so I had to remove it and use curl
. Also, there are issues with the JSON being returned (embedded NULLs) in large result sets so I had to handle that as well.
Rather than have the main function intuit caller intentions, I'll probably add a helper function to do the date breaks as suggested IF they don't change their API again soon (I'll give them some time to let the dust settle on these changes)
Hello @hrbrmstr, this is great!
I just wonder, is there any possibility to query the data at the intraday level? Or getting any sort of intraday timestamps?
Thanks!