ntamas / gfam

Automatic annotation of gene families
6 stars 4 forks source link

Is it gfam compatible with Interproscan5? #5

Open yjx1217 opened 8 years ago

yjx1217 commented 8 years ago

Hello Tamás,

I tried gfam for my project to group cerevisiae reference proteome into gene families. It ran smoothly but I found the final results looked a bit confusing to me. 1) Many functional annotation generated by Interproscan (e.g. those Panther annotations shown in the domain_architecture.tab file ) was discarded in the final assigned_labels.txt file. So the annotation information in the assigned_labels.txt is quite sparse. 2) I didn't see how genes were grouped into gene families as I expected. Where could I find this information?

Since there is a major updates from interproscan4 to interproscan5 (even for the output format of interproscan), I was wondering if this could be a reason. I can provide test data if needed. Thanks in advance!

ntamas commented 8 years ago

Hello,

I have checked the InterProScan output format here and it seems like not much has changed in the TSV format so I'm leaning towards saying that the output format of InterProScan 5 should be suitable for GFam. However, I cannot tell it for sure without seeing some test data on which I could run GFam and check the output. Can you upload some test data somewhere along with your GFam config file so I could try running the same thing on my machine?

yjx1217 commented 8 years ago

Hi Tamas,

Please check the attachment for the test files. Also, I noticed that the naming of many InterProScan annotation sources have been changed in the new TSV output. For example, HMMPfam -> Pfam, HMMTigr -> TIGRFAM, etc. So I've tried a modified version of the IPSv5.0 output by putting back those old names in my parallel run but I still cannot see any sign of clustering in the final Gfam output.

Best, Jia-Xing

On Sun, Apr 3, 2016 at 11:16 PM, Tamás Nepusz notifications@github.com wrote:

Hello,

I have checked the InterProScan output format here https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats and it seems like not much has changed in the TSV format so I'm leaning towards saying that the output format of InterProScan 5 should be suitable for GFam. However, I cannot tell it for sure without seeing some test data on which I could run GFam and check the output. Can you upload some test data somewhere along with your GFam config file so I could try running the same thing on my machine?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/ntamas/gfam/issues/5#issuecomment-205057043

Jia-Xing Yue

Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Faculté de Médecine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 Université de Nice Sophia Antipolis 28 Avenue de Valombrose 06107 NICE Cedex 2 France

Lab website: http://ircan.org/index.php?Itemid=98 Personal website: http://www.iamphioxus.org/

ntamas commented 8 years ago

Github has gotten rid of the attachments so please upload them somewhere and send me a link - I'll download and check it out tomorrow. (But you are probably correct, the renaming of InterProScan annotation sources may indeed be one part of the problem).

yjx1217 commented 8 years ago

Hi Tamas,

You can try to download the file via this link: http://tempsend.com/AF42BF2E1Dhttp://tempsend.com/AF42BF2E1D Let me know if it does not work.

Thanks for the help!

Best, Jia-Xing

On Sun, Apr 3, 2016 at 11:33 PM, Tamás Nepusz notifications@github.com wrote:

Github has gotten rid of the attachments so please upload them somewhere and send me a link - I'll download and check it out tomorrow. (But you are probably correct, the renaming of InterProScan annotation sources may indeed be one part of the problem).

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/ntamas/gfam/issues/5#issuecomment-205059813

Jia-Xing Yue

Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Faculté de Médecine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 Université de Nice Sophia Antipolis 28 Avenue de Valombrose 06107 NICE Cedex 2 France

Lab website: http://ircan.org/index.php?Itemid=98 Personal website: http://www.iamphioxus.org/