Open WinterFor opened 3 years ago
Hello @WinterFor,
It is great that you found annotations for the macaque monkey. What do the names look like in your population file?
Hi, @dvklopfenstein
Do you mean the population file refers to the file containing background genes from NCBI?
I get this file use query "9541"[Taxonomy ID] AND alive[property] AND genetype protein coding[Properties]
link
And I import it by from genes_ncbi_9541_proteincoding import GENEID2NT as GeneID2nt_monkey
print(list(GeneID2nt_monkey.items())[:10])
[(7857739, ntncbi(tax_id=9541, Org_name='Macaca fascicularis', GeneID=7857739, CurrentID=0, Status='live', Symbol='COX1', Aliases=['KEG98_p11'], description='cytochrome c oxidase subunit I', other_designations='cytochrome c oxidase subunit I', map_location='', chromosome='MT', genomic_nucleotide_accession_version='NC_012670.1', start_position_on_the_genomic_accession=5858, end_position_on_the_genomic_accession=7396, orientation='plus', exon_count=0, OMIM=[], no_hdr0='')),
(7857740, ntncbi(tax_id=9541, Org_name='Macaca fascicularis', GeneID=7857740, CurrentID=0, Status='live', Symbol='COX2', Aliases=['KEG98_p10'], description='cytochrome c oxidase subunit II', other_designations='cytochrome c oxidase subunit II', map_location='', chromosome='MT', genomic_nucleotide_accession_version='NC_012670.1', start_position_on_the_genomic_accession=7540, end_position_on_the_genomic_accession=8223, orientation='plus', exon_count=0, OMIM=[], no_hdr0='')),
(7857741, ntncbi(tax_id=9541, Org_name='Macaca fascicularis', GeneID=7857741, CurrentID=0, Status='live', Symbol='ATP8', Aliases=['KEG98_p09'], description='ATP synthase F0 subunit 8', other_designations='ATP synthase F0 subunit 8', map_location='', chromosome='MT', genomic_nucleotide_accession_version='NC_012670.1', start_position_on_the_genomic_accession=8364, end_position_on_the_genomic_accession=8570, orientation='plus', exon_count=0, OMIM=[], no_hdr0='')),
(7857742, ntncbi(tax_id=9541, Org_name='Macaca fascicularis', GeneID=7857742, CurrentID=0, Status='live', Symbol='ATP6', Aliases=['KEG98_p08'], description='ATP synthase F0 subunit 6', other_designations='ATP synthase F0 subunit 6', map_location='', chromosome='MT', genomic_nucleotide_accession_version='NC_012670.1', start_position_on_the_genomic_accession=8525, end_position_on_the_genomic_accession=9205, orientation='plus', exon_count=0, OMIM=[], no_hdr0='')),
(7857743, ntncbi(tax_id=9541, Org_name='Macaca fascicularis', GeneID=7857743, CurrentID=0, Status='live', Symbol='COX3', Aliases=['KEG98_p07'], description='cytochrome c oxidase subunit III', other_designations='cytochrome c oxidase subunit III', map_location='', chromosome='MT', genomic_nucleotide_accession_version='NC_012670.1', start_position_on_the_genomic_accession=9205, end_position_on_the_genomic_accession=9988, orientation='plus', exon_count=0, OMIM=[], no_hdr0=''))]
Do you mean the "names" refer to "Gene ID" in the background file?
Hello @WinterFor ,
The gaf file contains IDs that are UniProt protein IDs such as A0A023JBW5.
But your population file contains NCBI GeneIDs. You will need to convert the NCBI GeneIDs such as 7857739 to protein IDs or the other way around. You could do this by writing a script which used the gene Symbol in GeneID2nt_monkey to match NCBI GeneIDs and then by using the DB_ID in the gaf file to match the UniProt protein IDs.
It is a bit of work to get your scripts set up for a species that is not ubiquitous in most GO databases. But once they are set up, then you can do Gene Ontology Enrichment Analyses on all kinds of species.
You are doing a great job with all this. I can't wait to read your papers on macaque monkies.
I just want to add your QuickGO search for others that may need to search for species:
The QuickGO search was:
https://www.ebi.ac.uk/QuickGO/annotations?downloadLimit=483104&taxonId=9541&taxonUsage=descendants
@dvklopfenstein I may find another time to finish this script. Thanks for your help all time.
Hi, After #218, I found the macaque monkey annotation in gaf format from QuickGo. the macaque monkey annotation
macaca_fascicularis_from_quick_go_annotations.gaf
DownloadThen run without error. But the result is confusing.
Why are all 0%? Is this a reasonable result? Thanks!