murphycj / AGFusion

Python package to annotate and visualize gene fusions.
https://www.agfusion.app
MIT License
59 stars 25 forks source link

missing domain in output? #21

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello I noticed that it appears that there is not domain information in the agfusion DB for some genes e.g. SSX2. Is that correct and is there a way to get that into my agfusion database?

murphycj commented 6 years ago

I tested it out on a made up fusion:

agfusion annotate -g5 SSX2 -j5 52698786 -g3 FGFR2 -j3 121564577 -db agfusion.homo_sapiens.87.db -o test

It does provide annotation for SSX2. See below:

enst00000336777-enst00000457416

It could be that your fusion does not contain any of the annotated protein domains. Could also be that you're using an older version of Ensembl that does not contain annotation information for SSX2.

Can you give me the command you're using?

murphycj commented 6 years ago

And to answer your question. Yes, you could insert the domain information manually into the AGFusion DB. Just requires some SQLite skills. You can look in the "database.py" file in the "fetch_protein_annotation" function to see how I insert domain information into the database.

murphycj commented 6 years ago

I should also add I don't want to make any changes to the current AGFusion databases unless the changes are pertinent to the respective ensembl release. For example, if ensembl has no protein domain information for SSX2 for release 80 then I don't want to add it in manually.

ghost commented 6 years ago

Sorry should have included that - here's what I'm using: agfusion annotate --gene5prime ATRX --gene3prime SSX2 --junction5prime 77785846 --junction3prime 52700543 -db agfusion.homo_sapiens.87.db -o ATRX-SSX2 --WT

Totally makes sense to me about not adding it if isn't in the release.

If I look at the SSX2 wild-type generated by agfusion, it has the KRAB domain, but not the SSXRD domain/motif. Note that when agfusion produces the fusion or wild type for SSX1 it does have this domain, and from interpro etc I'm seeing that SSXRD is at the end of SSX2: http://www.ebi.ac.uk/interpro/protein/Q16385

murphycj commented 6 years ago

I see what you mean, but it seems the "SSXRD" domain/motif is just not listed in any of the protein annotation databases for ensembl release 87. I am going to update AGFusion to support up to ensembl release 91. I tested it out on 91 but still dod not see SSXRD listed. If you look at the ensembl page for SSX2, SSXRD is also not listed for the protein:

https://useast.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000241476;r=X:52696896-52707189;t=ENST00000336777

I can consider adding a feature to AGFusion so the user can manually include protein annotations as a flag. If it would be useful for you? For example:

agfusion annotate \\
--gene5prime ATRX \\
--gene3prime SSX2 \\
--junction5prime 77785846 \\
--junction3prime 52700543 \\
-db agfusion.homo_sapiens.87.db \\
-o ATRX-SSX2 \\
--WT \\
--add-domain "SSX2:SSXRD:158-187"
murphycj commented 6 years ago

An addition note. I know I could have AGFusion incorporate more protein annotation sources, but I just don't have the time right now to do so.

ghost commented 6 years ago

Thank you for investigating and that feature would be great but is certainly not necessary, completely understand about the time constraints.

murphycj commented 6 years ago

Will provide a feature to add custom domains in future release.