Closed andrewelamb closed 4 years ago
Thanks, Andrew. I have a break in meetings and will work on this now.
Thanks for the demo code. I like the idea of using description
to add uniqueness to the join — I ended up switching to grantName
, which resulted in the correct (129) number of matches. This also ticks one of the boxes in #25. 🙂
Here's the code, if you're curious (I reused some pieces from portal_etl.Rmd
, mostly because it was more familiar and easier to troubleshoot):
db_project_df <- get_table_df("syn21645147", cache = FALSE)
db_description_project <- get_table_df("syn21868407", cache = FALSE)
merged_project_tbl <- get_table_df("syn21868602", cache = FALSE)
merged_project_tmp <- db_project_df %>%
rename(projectId = id, projectName = name) %>%
left_join(
db_description_project, by = "projectId"
) %>%
left_join(
db_grant_df %>%
select(grantId = id, grantName = name, grantType, consortiumId),
by = "grantId"
) %>%
left_join(db_theme_grant, by = "grantId") %>%
left_join(
db_theme_df %>%
select(themeId = id, theme = displayName),
by = "themeId"
) %>%
left_join(
db_consortium_df %>%
select(consortiumId = id, consortium = displayName)
) %>%
select(projectId, projectName = displayName, grantName, projectType,
themeId, grantId, consortiumId) %>%
distinct()
merged_formatted_project_tmp <- merged_project_tmp %>%
filter(!is.na(grantId)) %>%
group_by(projectId, projectName, projectType) %>%
summarize(grantId = str_c(unique(grantId), collapse = ", "),
grantName = str_c(unique(grantName), collapse = ", "),
themeId = str_c(unique(themeId), collapse = ", "),
consortiumId = str_c(unique(consortiumId), collapse = ", ")) %>%
ungroup() %>%
mutate(projectName = str_trim(projectName)) %>%
distinct()
merged_project_update <- merged_project_tbl %>%
mutate(projectName = str_trim(projectName),
description = str_trim(description)) %>%
left_join(merged_formatted_project_tmp,
by = c( "projectName", "grantName")) %>%
select(projectId, projectName, projectType, description,
grantId, grantName, grantType,
themeId, theme, consortiumId, consortium) %>%
distinct()
merged_project_syntable <- update_table("syn21868602", merged_project_update)
The merged project table doesn't have project ids so to match them the following code is needed:
This results in 69 rows