Closed cansavvy closed 10 months ago
ariExtra:::get_page_ids()
extracts "page IDs of slides in a Google Slides presentation". A note about this function is that it somehow extracts out an non-existent page ID, g1013cbb9c28_0_45
, which is later filtered out with check_png_urls()
.
# Download Google Slides as PPTX
pptx_file <- ariExtra::download_gs_file(id = "https://docs.google.com/presentation/d/1Vjvq7PYuWsTkGi2EkXpnk0KtQYhbPSidBhMFQcqyb8I/edit?usp=sharing", out_type = "pptx")
# Extract speaker notes
speaker_notes <- ariExtra::pptx_notes(pptx_file)
# Get rid of filenames in name
names(speaker_notes) <- NULL
Background Research
ariExtra:::get_page_ids() extracts "page IDs of slides in a Google Slides presentation".
Oh that’s an awesome that exists! Great!
ariExtra:::get_page_ids()
doesn't work perfectly. Sometimes, it misses some ids (last slide id).
We could use the rgoogleslides package (need authorization) to talk to the API and get the objectIds (ids of all the slides):
library(rgoogleslides)
client_id <- "YOUR_CLIENT_ID"
client_secret <- "YOUR_CLIENT_SECRET"
# Authorize R package to access Google Slides API
authorize(client_id = client_id, client_secret = client_secret)
url <- "https://slides.googleapis.com/v1/presentations/YOUR_PRESENTATION_ID_HERE?fields=slides.objectId"
# Get auth token
token <- get_token()
config <- httr::config(token=token)
# Get object Id
result <- httr::GET(url, config = config, accept_json())
result_content <- content(result, "text")
result_list <- jsonlite::fromJSON(result_content)
# Character vector of objectIds
result_list$slides$objectId
Great article on using OAuth 2.0 in R: https://blog.r-hub.io/2021/01/25/oauth-2.0/
Note to myself:
So far, I've figured out how to:
gs4_auth()
)@cansavvy Made some changes today:
authorize()
: Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.extract_object_id()
: Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize()
. get_object_id_notes()
: Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id()
and get_gs_pptx() + pptx_notes()
Checks are failing because ariExtra is not on CRAN. I used the Remotes field in DESCRIPTION to depend on jhudsl/ariExtra: https://github.com/jhudsl/ottrpal/pull/116/commits/a618e6ab0e574a67b6162735240ad6695731a309#diff-9cc358405149db607ff830a16f0b4b21f7366e3c99ec00d52800acebe21b231cR47.
This SO post: Remotes is not an official description field and dependencies should be publicly available for submission on CRAN, . Also, Jenny Bryan says https://github.com/r-lib/devtools/issues/1717#issuecomment-368125654.
So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id
into ottrpal. Obviously, the latter is much easier to do.
Yeah that seems like a good solution. Alternatively you could have ari
as a dependency since you transferred things into that package from ariExtra right?
But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).
So, it seems like we need to put ariExtra on CRAN or just copy-paste
ariExtra::get_slide_id
into ottrpal. Obviously, the latter is much easier to do.
@cansavvy Made some changes today:
authorize()
: Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.extract_object_id()
: Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated byauthorize()
.get_object_id_notes()
: Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper aroundextract_object_id()
andget_gs_pptx() + pptx_notes()
Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?
@cansavvy Made some changes today:
authorize()
: Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.extract_object_id()
: Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated byauthorize()
.get_object_id_notes()
: Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper aroundextract_object_id()
andget_gs_pptx() + pptx_notes()
Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?
To test this:
client_id
and Client Secret as client_secret
.authorize(client_id = client_id, client_secret = client_secret)
. This will take you to a browser that looks like:Give all the Google Drive and Google Slides permissions and you should be seeing this message: Authentication complete. Please close this page and return to R.
Authentication complete
.extract_object_id("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing")
. This should output a character vector of the ids of each 19 slides.get_object_id_notes("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing")
. This should output a dataframe:Yeah that seems like a good solution. Alternatively you could have
ari
as a dependency since you transferred things into that package from ariExtra right?But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).
So, it seems like we need to put ariExtra on CRAN or just copy-paste
ariExtra::get_slide_id
into ottrpal. Obviously, the latter is much easier to do.
Good point. But, the ari
branch that contains this function, https://github.com/jhudsl/ari/tree/ariExtra-immigration, isn't on CRAN yet, so we encounter the same problem.
- Get a Google Cloud Client ID and Client Secret following steps outlined here:
This is a great place to start from! But we should think about how we want this to be implemented on the user side .
Setting up a Google Client Id is a lot for each user to do to just get the notes.
We should probably have a Google client id that is encrypted here and a default account that we can use. Perhaps we could make a dummy Google account so that is one more level safe. I can work on this potentially if you like and then we can pair program on it together.
I think I can do this fairly easily.
authorize()
Is this what you were thinking? Is this a safe method?
I think I can do this fairly easily.
Create a dummy Google email account ("ottrpal@gmail.com")
Generate Google Client ID and Client Secret from GCP
Set these as default arguments to
authorize()
Is this what you were thinking? Is this a safe method?
Yes that's part 1. But we'll still want to keep those credentials safe via some encryption steps and finding a way (if possible) to just provide oAuth token from that account by default. That last part is easy through GitHub secrets but we'd have to think about the set up if people want to use the function locally. In the later case, we'd probably want them to provide their own credentials.
What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?
What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?
Yeah programmatic access through a secrets. Here's an example of that: https://github.com/datatrail-jhu/rgoogleclassroom/blob/fbf7f2a5479d25546ea51533c769ebeaae8cbbb6/R/auth.R#L116
And then the secrets can be GitHub secrets
Purpose/implementation Section
What changes are being implemented in this Pull Request?
This is related to https://github.com/jhudsl/OTTR_Template/issues/487
What was your approach?
I was trying to have it so it would download slide notes from Google Slides and then put them in the fig.alt option for the relevant
ottrpal::get_image_from_slide()
bitThis would be good because right now people have to do copy and pasting to update fig.alts and that's an easy way to get mistakes because the slide notes can quickly become out of sync with what is in the course.
The problem I've encountered is that there is no easy URL to retrieve slide notes from. You appear to need google authorization to download the powerpoint and get slide notes that way (which is what ari does).
I have been looking for a simpler work around so we don't have to supply google auth but there may not be one. ¯_(ツ)_/¯
What GitHub issue does your pull request address?
https://github.com/jhudsl/OTTR_Template/issues/487
Tell potential reviewers what kind of feedback you are soliciting.
I'm posting this so @howardbaek can see what I was starting to work on and he might find a better solution.
The dream would be, we could download one slide's notes at a time using its slide ID eg.:
If this is the slide link:
https://docs.google.com/presentation/d/1ME0NbcIBmnHJRhX3JJyCwJuuomkl_BjJp6lD5oD5WnU/edit#slide=id.gd422c5de97_0_5
then somehow we could alter this URL to get the slide notes. But so far no such luck.