updating gene pages - Githubissues

as per https://github.com/nih-cfde/update-content-registry/pull/28#issuecomment-1184390759

the GETex, Transcripts, and UCSC browsers on only a few gene pages, as indicated by these files

data/input/gene_IDs_for_expression_widget.txt
data/input/gene_IDs_for_transcript_widget.txt
data/input/gene_IDs_for_UCSC_genome_browser_widget.txt

the alias table and the Appyter widgets are displayed on nearly all gene pages. they are different for dev and for staging

data/input/DEV_PORTAL__available_genes__2022-07-01.txt
data/input/STAGING_PORTAL__available_genes__2022-07-13.txt

Here's what I've done to create these

Visit the Gene page on the CFDE Search Portal
Export the search results to a csv
Read the CSV file into R. Extract only the ENSG IDs. Save the list as a .txt doc
Update the Snakefule with the new data. Attempt to submit. Get error messages about failed genes.
Make list of failed genes. Remove failed genes for list of ENSG IDs in the .txt doc. Update Snakemake file. Resubmit. Repeat until successful.

Screen Shot 2022-07-14 at 2 58 56 PM

library(tidyverse)

failedgenes <- c("ENSG00000093134",  "ENSG00000164393", "ENSG00000184293",
            "ENSG00000184293", "ENSG00000203812", "ENSG00000188707",
            "ENSG00000221995",  "ENSG00000214534", "ENSG00000225932",
            "ENSG00000244693", "ENSG00000256374", "ENSG00000263464",
            "ENSG00000105501", "ENSG00000161133")

missinggenes <-  read.table("./data/inputs/missing.txt")

df <- read.csv("/data/inputs/Gene.csv") %>%
  arrange(id) %>%
  filter(!id %in% failedgenes) %>%
  filter(!id %in% missinggenes$V1) %>%
  select(id)
names(df) <- NULL
head(df)

write.table(df, 
            "./data/inputs/STAGING_PORTAL__available_genes__2022-07-13.txt",
            row.names = F, quote = F)

nih-cfde / update-content-registry

updating gene pages #29