jennybc / googlesheets

Google Spreadsheets R API
Other
783 stars 191 forks source link

Allegedly public sheet that requires pvt visibility to read #167

Closed jennybc closed 7 years ago

jennybc commented 9 years ago

Sheet holds the data for this caffeine and calories plot:

http://www.informationisbeautiful.net/visualizations/caffeine-and-calories/

Sheet is here:

https://docs.google.com/spreadsheets/d/1KYMUjrCulPtpUHwep9bVvsBvmVsDEbucdyRZ5uHCDxw/

Why do I need to be authenticated to access this seemingly public sheet? Or, more narrowly, do I need to use a worksheets feed that is private vs public? We claim no auth is being used and yet the request only works if visibility = "private". This seems contradictory?

> library(googlesheets)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> gs_user()
No authorization yet in this session!
NOTE: a .httr-oauth file exists in current working directory.
 Run gs_auth() to use the credentials cached in .httr-oauth for this session.
No user currently authorized.
> drinks_key <- "1KYMUjrCulPtpUHwep9bVvsBvmVsDEbucdyRZ5uHCDxw"
> drinks_ss <- gs_key(drinks_key, lookup = FALSE, visibility = "private")
Authentication will not be used.
Worksheets feed constructed with private visibility
> drinks_ss
                  Spreadsheet title: Caffeine and Calories
                 Spreadsheet author: david.mccandless
  Date of googlesheets registration: 2015-09-10 06:52:12 GMT
    Date of last spreadsheet update: 2010-07-19 10:01:24 GMT
                         visibility: private
                        permissions: rw
                            version: new

Contains 1 worksheets:
(Title): (Nominal worksheet extent as rows x columns)
Coffees compared: 71 x 14

Key: 1KYMUjrCulPtpUHwep9bVvsBvmVsDEbucdyRZ5uHCDxw
Browser URL: https://docs.google.com/spreadsheets/d/1KYMUjrCulPtpUHwep9bVvsBvmVsDEbucdyRZ5uHCDxw/
> drinks <- gs_read(drinks_ss)
Accessing worksheet titled "Coffees compared"
> glimpse(drinks)
Observations: 55
Variables: 8
$ Coffee                                                                                            (chr) ...
$ Coffee.chain                                                                                      (chr) ...
$ Calories                                                                                          (dbl) ...
$ Caffeine                                                                                          (dbl) ...
$ sources..World.Cancer.Research.Fund..Starbucks.Beverage.Nutrition.Guide..Calorie.Counter.Database (lgl) ...
$ http...www.starbucks.com.menu.catalog.nutrition.drink.all.view_control.nutrition                  (lgl) ...
$ http...www.wcrf.uk.org                                                                            (lgl) ...
$ http...caloriecount.about.com.                                                                    (lgl) ...
jennybc commented 9 years ago

@hardin47 I'm going to figure out what's going on with this Sheet here!

jennybc commented 9 years ago

Interesting, long-standing bug report on the Sheets API:

Reading a public spreadsheet (sometimes) requires Google account login

jennybc commented 8 years ago

I'm beginning to suspect something like this:

File > Publish to the web affects (confers) access to the world via the list and cell feeds.

whereas

(upper right corner) Share > Public on the web affect (confers) access to the world via the exportcsv link.

Or something along these lines ...

UPDATE: more clear evidence https://github.com/jennybc/googlesheets/issues/212 that this is true. So gs_read() should detect this, message (in case user wants to change Sheet permissions for faster reads) but proceed to use one of the other feeds.

joelgombin commented 7 years ago

It seems this issue has not been closed yet, @jennybc do you have guidance as to what the good practices are as of today? Making sure that the sheet is published on the web?

joelgombin commented 7 years ago

(BTW even if the sheet is published gs_url doesn't work without authentification, it seems)

jennybc commented 7 years ago

I am not up to speed on my own package right now 😱 but will be in early 2017. The logic for how I build the URL is convoluted (probably more than it should be!).

But I don't want to to ignore you until January! In the meantime, consider taking charge explicitly of the lookup and/or visibility argument(s). I will try to give a better answer soon but maybe this helps ...

joelgombin commented 7 years ago

Thanks for your answer! Yeah that's what I did. But it's not always a great solution (I'm using your package for training and I'd rather not assume that all students have a Google account, even though they probably do). Also this solution doesn't work with knitr, nor in many situations, I think, in Rstudio server. But again, thanks for your help, it (and the package!) is much appreciated!

jennybc commented 7 years ago

OK so I really need to get to the bottom of it then.

You have a sheet that is not readable via googlesheets, yes? And in what sense is it "public"? Because Public on the Web and Published on the web are two very different things, sadly. And that is not my fault. It's a super-confusing aspect of the API.

However, it is still possible that I am somehow building a URL that requires auth when it should not. That's why I really want to understand what you're seeing.

joelgombin commented 7 years ago

I'm using this sheet for example: https://docs.google.com/spreadsheets/d/1rS5h30nqhk0GS3WLUxBD0jY0u3-Qzm7YC8IpflcAptU/edit?usp=sharing At the moment it's "public on the web" (I think, my google sheet interface is in French...), but I've tried playing around with its publicity and it's doesn't change things with the package.

jennybc commented 7 years ago

It is very important to be "published on the web" vs "public on the web".

https://github.com/jennybc/googlesheets/issues/163

If we are lucky, that is your problem. But the existence of this issue suggests that there may be yet another problem in some cases. And it's not entirely clear if it's my problem or the Sheets API.

jennybc commented 7 years ago

Once you have confirmed the Sheet is "Published on the web", I'd be interested to know what you get with this:

library(googlesheets)
ss <- gs_key("1rS5h30nqhk0GS3WLUxBD0jY0u3-Qzm7YC8IpflcAptU", lookup = FALSE)
jennybc commented 7 years ago

At the moment, it appears to still be "Public on the web", which does not confer API access.

joelgombin commented 7 years ago

Ah, OK, I finally got it. Once the sheet is "published on the web", this works without auth. But that's tricky because I thought it could be set similarly as "public on the web", and could'nt find it, but nope!

jennybc commented 7 years ago

Yeah, this is an unfortunate feature of v3 of the API (see the huge red boxes here):

https://developers.google.com/google-apps/spreadsheets/worksheets

Hopefully, when I switch to v4, this is one of the things that will go away.