jhudsl / ottrpal

Tools for converting OTTR courses into Leanpub or Coursera courses :otter:
https://jhudatascience.org/ottrpal/
GNU General Public License v3.0
3 stars 1 forks source link

Cansavvy's attempts at getting fig.alt's to sync between Google slides and Rmd #116

Closed cansavvy closed 10 months ago

cansavvy commented 12 months ago

Purpose/implementation Section

What changes are being implemented in this Pull Request?

This is related to https://github.com/jhudsl/OTTR_Template/issues/487

What was your approach?

I was trying to have it so it would download slide notes from Google Slides and then put them in the fig.alt option for the relevant ottrpal::get_image_from_slide()bit

This would be good because right now people have to do copy and pasting to update fig.alts and that's an easy way to get mistakes because the slide notes can quickly become out of sync with what is in the course.

The problem I've encountered is that there is no easy URL to retrieve slide notes from. You appear to need google authorization to download the powerpoint and get slide notes that way (which is what ari does).

I have been looking for a simpler work around so we don't have to supply google auth but there may not be one. ¯_(ツ)_/¯

What GitHub issue does your pull request address?

https://github.com/jhudsl/OTTR_Template/issues/487

Tell potential reviewers what kind of feedback you are soliciting.

I'm posting this so @howardbaek can see what I was starting to work on and he might find a better solution.

The dream would be, we could download one slide's notes at a time using its slide ID eg.:

If this is the slide link: https://docs.google.com/presentation/d/1ME0NbcIBmnHJRhX3JJyCwJuuomkl_BjJp6lD5oD5WnU/edit#slide=id.gd422c5de97_0_5 then somehow we could alter this URL to get the slide notes. But so far no such luck.

howardbaek commented 11 months ago

Page ID

ariExtra:::get_page_ids() extracts "page IDs of slides in a Google Slides presentation". A note about this function is that it somehow extracts out an non-existent page ID, g1013cbb9c28_0_45, which is later filtered out with check_png_urls().

Speaker Notes

# Download Google Slides as PPTX
pptx_file <- ariExtra::download_gs_file(id = "https://docs.google.com/presentation/d/1Vjvq7PYuWsTkGi2EkXpnk0KtQYhbPSidBhMFQcqyb8I/edit?usp=sharing", out_type = "pptx")

# Extract speaker notes
speaker_notes <- ariExtra::pptx_notes(pptx_file)
# Get rid of filenames in name
names(speaker_notes) <- NULL
cansavvy commented 11 months ago

Background Research

ariExtra:::get_page_ids() extracts "page IDs of slides in a Google Slides presentation".

Oh that’s an awesome that exists! Great!

howardbaek commented 11 months ago

ariExtra:::get_page_ids() doesn't work perfectly. Sometimes, it misses some ids (last slide id).

We could use the rgoogleslides package (need authorization) to talk to the API and get the objectIds (ids of all the slides):

library(rgoogleslides)

client_id <- "YOUR_CLIENT_ID"
client_secret <- "YOUR_CLIENT_SECRET"

# Authorize R package to access Google Slides API
authorize(client_id = client_id, client_secret = client_secret)

url <- "https://slides.googleapis.com/v1/presentations/YOUR_PRESENTATION_ID_HERE?fields=slides.objectId"
# Get auth token
token <- get_token()
config <- httr::config(token=token)

# Get object Id
result <- httr::GET(url, config = config, accept_json())
result_content <- content(result, "text")
result_list <- jsonlite::fromJSON(result_content)

# Character vector of objectIds
result_list$slides$objectId
howardbaek commented 11 months ago

Great article on using OAuth 2.0 in R: https://blog.r-hub.io/2021/01/25/oauth-2.0/

Note to myself:

So far, I've figured out how to:

howardbaek commented 11 months ago

@cansavvy Made some changes today:

howardbaek commented 11 months ago

Checks are failing because ariExtra is not on CRAN. I used the Remotes field in DESCRIPTION to depend on jhudsl/ariExtra: https://github.com/jhudsl/ottrpal/pull/116/commits/a618e6ab0e574a67b6162735240ad6695731a309#diff-9cc358405149db607ff830a16f0b4b21f7366e3c99ec00d52800acebe21b231cR47.

This SO post: Remotes is not an official description field and dependencies should be publicly available for submission on CRAN, . Also, Jenny Bryan says https://github.com/r-lib/devtools/issues/1717#issuecomment-368125654.

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

cansavvy commented 11 months ago

Yeah that seems like a good solution. Alternatively you could have ari as a dependency since you transferred things into that package from ariExtra right?

But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

cansavvy commented 11 months ago

@cansavvy Made some changes today:

  • authorize(): Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.
  • extract_object_id(): Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize().
  • get_object_id_notes(): Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id() and get_gs_pptx() + pptx_notes()

Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?

howardbaek commented 11 months ago

@cansavvy Made some changes today:

  • authorize(): Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.
  • extract_object_id(): Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize().
  • get_object_id_notes(): Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id() and get_gs_pptx() + pptx_notes()

Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?

To test this:

  1. Get a Google Cloud Client ID and Client Secret following steps outlined here: https://www.hairizuan.com/rgoogleslides-using-your-own-account-client-id-and-secret/. Save the Client ID in R as client_id and Client Secret as client_secret.
  2. Run authorize(client_id = client_id, client_secret = client_secret). This will take you to a browser that looks like:

Screenshot 2023-08-01 at 11 12 50 AM

Give all the Google Drive and Google Slides permissions and you should be seeing this message: Authentication complete. Please close this page and return to R.

  1. Close page and return to R, where the console should show Authentication complete.
  2. Now, you have generated an OAuth 2.0 Access Token and stored it in an environment for later use.
  3. Use stored token to talk to Google Slides API: extract_object_id("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing"). This should output a character vector of the ids of each 19 slides.
  4. To get the speaker notes+corresponding ids, run get_object_id_notes("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing"). This should output a dataframe:

Screenshot 2023-08-01 at 11 18 36 AM

howardbaek commented 11 months ago

Yeah that seems like a good solution. Alternatively you could have ari as a dependency since you transferred things into that package from ariExtra right?

But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

Good point. But, the ari branch that contains this function, https://github.com/jhudsl/ari/tree/ariExtra-immigration, isn't on CRAN yet, so we encounter the same problem.

cansavvy commented 11 months ago
  1. Get a Google Cloud Client ID and Client Secret following steps outlined here:

This is a great place to start from! But we should think about how we want this to be implemented on the user side .

Setting up a Google Client Id is a lot for each user to do to just get the notes.

We should probably have a Google client id that is encrypted here and a default account that we can use. Perhaps we could make a dummy Google account so that is one more level safe. I can work on this potentially if you like and then we can pair program on it together.

howardbaek commented 11 months ago

I think I can do this fairly easily.

  1. Create a dummy Google email account ("ottrpal@gmail.com")
  2. Generate Google Client ID and Client Secret from GCP
  3. Set these as default arguments to authorize()

Is this what you were thinking? Is this a safe method?

cansavvy commented 11 months ago

I think I can do this fairly easily.

  1. Create a dummy Google email account ("ottrpal@gmail.com")

  2. Generate Google Client ID and Client Secret from GCP

  3. Set these as default arguments to authorize()

Is this what you were thinking? Is this a safe method?

Yes that's part 1. But we'll still want to keep those credentials safe via some encryption steps and finding a way (if possible) to just provide oAuth token from that account by default. That last part is easy through GitHub secrets but we'd have to think about the set up if people want to use the function locally. In the later case, we'd probably want them to provide their own credentials.

howardbaek commented 11 months ago

What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?

cansavvy commented 11 months ago

What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?

Yeah programmatic access through a secrets. Here's an example of that: https://github.com/datatrail-jhu/rgoogleclassroom/blob/fbf7f2a5479d25546ea51533c769ebeaae8cbbb6/R/auth.R#L116

And then the secrets can be GitHub secrets