yochannah / wizard-api-specs

https://yochannah.github.io/wizard-api-specs/.
Other
0 stars 3 forks source link

look at files that make julie angry #4

Open yochannah opened 6 years ago

yochannah commented 6 years ago

for good validation samples @julie-sullivan

julie-sullivan commented 6 years ago

/micklem/data/human/GTEx/current

data parser

GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_median_tpm.gct The genes are on the left side, and the tissues are along the top with the intersection being the expression score.

The tissues really should match up to an anatomy ontology, but I don't think they do!

Liver.v7.signif_variant_gene_pairs.txt SNP and associated gene plus a p-value and tss distance. We discard the rest.

julie-sullivan commented 6 years ago

Things that have happened:

  1. ^@
  2. tabs and spaces (OMIM does this) so columns don't align
  3. all numbers but have a string to represent NULLs (e.g. N/A, n, dash, etc)
  4. Case changes e.g. DNAse vs. DNase. dev vs. development.
  5. wrong data. e.g. UniProt had a protein in both TrEMBL and SwissProt. That shouldn't happen!