hrbrmstr / webhose

:hammer: Tools to Work with the 'webhose.io' 'API' in R
12 stars 3 forks source link

fetch_posts query #5

Open Yardstick-M opened 7 years ago

Yardstick-M commented 7 years ago

Thanks @hrbrmstr and @ottlngr for all your great work in setting up tools to work with the 'webhose.io' 'API' in R.

I'm a complete newbie to programming so excuse my ignorance. Looking at the instructions I've successfully got as far as:

library(webhose)

current verison

packageVersion("webhose")

[1] '0.1.0'

but when i get to 'make a call', I'm a little confused as to what to do. Your example for 'fetch_posts' is:

res <- fetch_posts("(China AND United) language:english site_type:news site:bloomberg.com", ts = 1213456)

Do I just use this, substituting in my own query? Or do I use 'usage' as per the Help section:

fetch_posts(query, sort = "relevancy", ts = (Sys.time() - (3 24 60 * 60)), order = "desc", accuracy_confidence = NULL, highlight = FALSE, pre_alloc_max = 30, quiet = !interactive(), token = Sys.getenv("WEBHOSE_TOKEN"), ...)

When I use 'usage' substituting in my own query and webhose token, I get an error. Here is my fetch_query:

fetch_posts("(BHP Billiton) language:english", sort = "relevancy", ts = (Sys.time() - (3 24 60 60)), order = "desc", size = 100, accuracy_confidence = NULL, highlight = FALSE, from = 0, quiet = !interactive(), token = Sys.getenv("XXXXX"), ...)

Any assistance on where I'm going wrong would be appreciated.

Regards

Ben

hrbrmstr commented 7 years ago

Thx for taking the pkg for a spin and esp the time to post an issue!

I tried this:

library(webhose)

bhp <- fetch_posts("(BHP Billiton) language:english")
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## Fetching next 100 records...
## You have 969 API calls remaining on your plan

Which resulted in:

dplyr::glimpse(bhp)
## Observations: 2,662
## Variables: 42
## $ uuid                              <chr> "f89c4930e4adb7b34a072a4d6d172d231d41...
## $ url                               <chr> "http://omgili.com/ri/.wHSUbtEfZSJlbP...
## $ ord_in_thread                     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ author                            <chr> "Michael Slezak", "Isabel Dayman", ""...
## $ published                         <chr> "2017-11-07T19:00:00.000+02:00", "201...
## $ title                             <chr> "BHP opposes Minerals Council of Aust...
## $ text                              <chr> "Environmental activism BHP opposes M...
## $ highlighttext                     <chr> "Environmental activism <em>BHP</em> ...
## $ highlighttitle                    <chr> "", "", "", "", "", "", "", "", "", "...
## $ language                          <chr> "english", "english", "english", "eng...
## $ external_links                    <list> [<>, <>, <"https://wdo-m.tlnk.io/ser...
## $ rating                            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ crawled                           <chr> "2017-11-07T19:16:44.000+02:00", "201...
## $ thread_uuid                       <chr> "f89c4930e4adb7b34a072a4d6d172d231d41...
## $ thread_url                        <chr> "http://omgili.com/ri/.wHSUbtEfZSJlbP...
## $ thread_site_full                  <chr> "www.theguardian.com", "www.abc.net.a...
## $ thread_site                       <chr> "theguardian.com", "abc.net.au", "huf...
## $ thread_site_section               <chr> "", "", "https://www.huffingtonpost.c...
## $ thread_site_categories            <list> ["media", <"music", "entertainment">...
## $ thread_section_title              <chr> "", "", "U.S. Political News, Opinion...
## $ thread_title                      <chr> "BHP opposes Minerals Council of Aust...
## $ thread_title_full                 <chr> "BHP opposes Minerals Council of Aust...
## $ thread_published                  <chr> "2017-11-07T19:00:00.000+02:00", "201...
## $ thread_replies_count              <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ thread_participants_count         <int> 0, 0, 2, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1...
## $ thread_site_type                  <chr> "news", "news", "news", "news", "news...
## $ thread_country                    <chr> "US", "AU", "US", "US", "US", "AU", "...
## $ thread_spam_score                 <dbl> 0.048, 0.000, 0.000, 0.000, 0.000, 0....
## $ thread_main_image                 <chr> "https://i.guim.co.uk/img/media/ea4ba...
## $ thread_performance_score          <int> 10, 8, 10, 6, 10, 7, 5, 5, 2, 2, 1, 1...
## $ thread_domain_rank                <int> 170, 1455, 139, 356, 170, 1455, 4726,...
## $ thread_social_facebook_likes      <int> 1448, 887, 5014, 668, 1673, 730, 589,...
## $ thread_social_facebook_comments   <int> 0, 0, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ thread_social_facebook_shares     <int> 1448, 887, 5014, 668, 1673, 730, 589,...
## $ thread_social_gplus_shares        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ thread_social_pinterest_shares    <int> 0, 0, 9, 1, 0, 0, 0, 0, 0, 1, 5, 0, 0...
## $ thread_social_linkedin_shares     <int> 11, 1267, 49, 196, 127, 0, 0, 8, 101,...
## $ thread_social_stumbledupon_shares <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ thread_social_vk_shares           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ entities_persons                  <list> [<c("andrew mackenzie", "lenore tayl...
## $ entities_organizations            <list> [<c("bhp", "minerals council of aust...
## $ entities_locations                <list> [<c("london", "england", "australia"...

You can generally skip most of the function params (depending on what you're trying to find) and your query string seems spot-on.

Yardstick-M commented 7 years ago

Thanks @hrbrmstr for your reply.

I've tried replicating your steps but I get the following:

library(webhose)

current verison

packageVersion("webhose") [1] ‘0.1.0’ bhp <- fetch_posts("(BHP Billiton) language:english") Error in filter_posts(query = query, sort = sort, ts = ts, order = order, : Unauthorized (HTTP 401).

Again, I've probably overlooked something simple but any assistance would be appreciated.

Thanks

Ben

hrbrmstr commented 7 years ago

An API key is required. You need to go to webhose and get one then put it in your .Renviron on a line like:

​ ​ WEBHOSE_TOKEN=the-api-key-they-gave-you

On Fri, Nov 17, 2017 at 5:13 AM, Yardstick-M notifications@github.com wrote:

Thanks @hrbrmstr https://github.com/hrbrmstr for your reply.

I've tried replicating your steps but I get the following:

library(webhose) current verison

packageVersion("webhose") [1] ‘0.1.0’ bhp <- fetch_posts("(BHP Billiton) language:english") Error in filter_posts(query = query, sort = sort, ts = ts, order = order, : Unauthorized (HTTP 401).

Again, I've probably overlooked something simple but any assistance would be appreciated.

Thanks

Ben

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/webhose/issues/5#issuecomment-345201909, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtk2B3jHFnmIMtjBkbQZOQK7X9221ks5s3VwwgaJpZM4QgajN .

Yardstick-M commented 7 years ago

Hi

Thanks for that. I've kept all params for illustrative purposes but when I enter the following with my API token instead of "XX":

bhp <- fetch_posts("(BHP Billiton) language:english", sort = "relevancy", ts = (Sys.time() - (3 24 60 * 60)), order = "desc", accuracy_confidence = NULL, highlight = FALSE, pre_alloc_max = 30, quiet = !interactive(), token = Sys.getenv("XX"))

I get the following error message:

Error in filter_posts(query = query, sort = sort, ts = ts, order = order, : Unauthorized (HTTP 401)

Again, any assistance appreciated.

Cheers

ottlngr commented 6 years ago

Be sure not to mix things up when trying to set token = Sys.getenv("YOUR_API_TOKEN").

The documentation of fetch_posts() says:

Your private access token. You get a unique access token when you sign up. Store it in an environment variable WEBHOSE_TOKEN (usually in ~/.Renviron) or provide it directly.

So the first method described uses an environment variable - this is a variable stored in your R session. There two (or more) ways to create an environment variable. One way is to define it in you .Renviron file as described by @hrbrmstr here. Another is to call Sys.setenv() like this:

Sys.setenv("WEBHOSE_TOKEN" = "YOUR_API_TOKEN")

After doing this, you should be able to get this environment variable back by calling Sys.getenv("WEBHOSE_TOKEN"). And that is what fetch_posts() by default does: It looks for this environment variable to read the API token from it.

If you do not want to store your token in an environment variable, you can just pass your token directly to fetch_posts() by setting token = "YOUR_API_TOKEN").

I hope that helps to understand the Sys.getenv() call and to use this package.

Yardstick-M commented 6 years ago

Thanks @ottlngr - that works. Appreciate your comments.

BTW any suggestions on how to automatically fetch posts for the same query on an on-going basis? i.e. the call just keeps updating as new posts come in.

Cheers