nih-cfde / update-content-registry

Code and workflows for adding content to the content registry.
https://app-staging.nih-cfde.org/
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

updating gene pages #29

Open raynamharris opened 2 years ago

raynamharris commented 2 years ago

as per https://github.com/nih-cfde/update-content-registry/pull/28#issuecomment-1184390759

the GETex, Transcripts, and UCSC browsers on only a few gene pages, as indicated by these files

the alias table and the Appyter widgets are displayed on nearly all gene pages. they are different for dev and for staging

Here's what I've done to create these

  1. Visit the Gene page on the CFDE Search Portal
  2. Export the search results to a csv
  3. Read the CSV file into R. Extract only the ENSG IDs. Save the list as a .txt doc
  4. Update the Snakefule with the new data. Attempt to submit. Get error messages about failed genes.
  5. Make list of failed genes. Remove failed genes for list of ENSG IDs in the .txt doc. Update Snakemake file. Resubmit. Repeat until successful.

Screen Shot 2022-07-14 at 2 58 56 PM

library(tidyverse)

failedgenes <- c("ENSG00000093134",  "ENSG00000164393", "ENSG00000184293",
            "ENSG00000184293", "ENSG00000203812", "ENSG00000188707",
            "ENSG00000221995",  "ENSG00000214534", "ENSG00000225932",
            "ENSG00000244693", "ENSG00000256374", "ENSG00000263464",
            "ENSG00000105501", "ENSG00000161133")

missinggenes <-  read.table("./data/inputs/missing.txt")

df <- read.csv("/data/inputs/Gene.csv") %>%
  arrange(id) %>%
  filter(!id %in% failedgenes) %>%
  filter(!id %in% missinggenes$V1) %>%
  select(id)
names(df) <- NULL
head(df)

write.table(df, 
            "./data/inputs/STAGING_PORTAL__available_genes__2022-07-13.txt",
            row.names = F, quote = F)