Closed kevinschaper closed 2 years ago
This is the command that makes the small version of the mapping file:
gzcat ./data/goa/uniprot_2_gene.tab.gz | awk 'BEGIN {OFS=\"\t\"} ($7==10090 || $7==10116 || $7==162425 || $7==44689 || $7==6239 || $7==7227 || $7==7955 || $7==9031 || $7==9606 || $7==9615 || $7==9823 || $7==9913) {print $1,$3}' | pigz > ./data/goa/uniprot_2_entrez.tab.gz
I think I messed up by not specifying the input field separator for awk.
Nice to know what was going on, but with #192 we won't actually need this fix!
I was just looking at what's happening with mapping in the GO Annotation ingest, and I don’t understand what I’m seeing - it seems like maybe we’re doing something very weird.
is getting turned into
and I’m not sure what NCBIGene:UniRef100_A0A2R8QCY5 is, but it seems like we’re mapping wrong?
The big mapping file has: