Closed dustine32 closed 1 year ago
Remove gene_symbol and gene_name fields from annotations.json. These fields should only live in the gene_info.json.
gene_symbol
gene_name
Use case is when gene symbols or names differ between the two sources these are pulled from: upstream annotation GAFs or gene.dat. Some examples:
gene.dat
UniProtKB:A0A1W2PRP0
UniProtKB:Q9NUQ7
The result of these "annotation says vs. gene_info says" arguments is that duplicate gene entries appear in the data:
New code should attempt to always rescue blank values of these two fields by scavenging either GAF annotations or gene.dat for any non-blank value.
Remove
gene_symbol
andgene_name
fields from annotations.json. These fields should only live in the gene_info.json.Use case is when gene symbols or names differ between the two sources these are pulled from: upstream annotation GAFs or
gene.dat
. Some examples:UniProtKB:A0A1W2PRP0
- GAF symbol column = "A0A1W2PRP0";gene.dat
symbol column = ""UniProtKB:Q9NUQ7
- GAF name column = "";gene.dat
name column = "Ufm1-specific protease 2"The result of these "annotation says vs. gene_info says" arguments is that duplicate gene entries appear in the data:![image](https://user-images.githubusercontent.com/2678599/226657457-edec3333-9c96-4926-a3f0-a1ddda875bd9.png)
New code should attempt to always rescue blank values of these two fields by scavenging either GAF annotations or
gene.dat
for any non-blank value.