jkeirstead / scholar

Analyse citation data from Google Scholar
Other
312 stars 83 forks source link

add function get_pubs_all_authors - submission #122

Open higgi13425 opened 9 months ago

higgi13425 commented 9 months ago

Hi there - In order to build a network diagram of collaborations, I needed to get all authors. We had a lot of senior authors with many publications, and many had > 5 authors. This was problematic with Google Scholar, so I developed a slightly different function that might be helpful, as Google Scholar complains less. This uses _getpublications, then identifies which pubids have >5 authors, and runs _get_completeauthours only on these, and then joins these back to the original _getpublications df, and cleans this up to the original 9 column names.

get_pubs_all_authors <- function(author_id, delay = 0.8) { df1 <- scholar::get_publications(author_id) %>% dplyr::mutate(id = author_id) df2 <- df1 %>% dplyr::filter(stringr::str_detect( author, "\.\.\.")) df3 <- df2 %>% dplyr::mutate(complete_authors = purrr::map2(id, pubid, scholar::get_complete_authors, delay = 0.8, initials = TRUE)) df4 <- df3 %>% dplyr::select(pubid, complete_authors) %>% dplyr::mutate(complete_authors = base::unlist(complete_authors)) df5 <- dplyr::left_join(df1, df4, by = dplyr::join_by(pubid)) %>% dplyr::mutate(authors = dplyr::case_when(is.na(complete_authors) ~ author, .default = complete_authors)) %>% dplyr::select(-author, -complete_authors) |> dplyr::rename(author = authors) |> dplyr::relocate(title, author) return(df5) }

I hope that this is helpful.

Peter

jotech commented 5 months ago

Many thanks. This was what I was looking for!

I had to slightly change the regex in your str_detect() call to make it work for me: dplyr::filter(stringr::str_detect(author, "\\.\\.\\."))