reage / interproscan

Automatically exported from code.google.com/p/interproscan
0 stars 0 forks source link

[interhelp #25063] Missing the 6th field (Signature Description) in tsv output for SUPERFAMILY #47

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run the interpro
2. check the 6th field for SUPERFAMILY hits
3. The value is missing (you can see it in XML output)

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Interproscan 5 on LINUX 

Please provide any additional information below.

Here is the description from your project home about TSV format

The TSV format presents the match data in columns as follows:

    Protein Accession (e.g. P51587)
    Sequence MD5 digest (e.g. 14086411a2cdf1c4cba63020e1622579)
    Sequence Length (e.g. 3418)
    Analysis (e.g. Pfam / PRINTS / Gene3D)
    Signature Accession (e.g. PF09103 / G3DSA:2.40.50.140)
    Signature Description (e.g. BRCA2 repeat profile)
    Start location
    Stop location
    Score - is the e-value of the match reported by member database method (e.g. 3.1E-52)
    Status - is the status of the match (T: true)
    Date - is the date of the run
    (InterPro annotations - accession (e.g. IPR002093) - optional column; only displayed if -iprscan option is switched on)
    (InterPro annotations - description (e.g. BRCA2 repeat) - optional column; only displayed if -iprscan option is switched on)
    (GO annotations (e.g. GO:0005515) - optional column; only displayed if --goterms option is switched on)
    (Pathways annotations (e.g. REACT_71) - optional column; only displayed if --pathways option is switched on) 

Original issue reported on code.google.com by llllg...@gmail.com on 7 Aug 2014 at 6:36

GoogleCodeExporter commented 9 years ago
Hi,

Thank you for the message. I looked into the code and data, and here is the 
current status...

For some member databases we have signature accessions, names and descriptions 
available. But for SUPERFAMILY matches, we only populate the signature 
accession and name. The signature description is missing since we don't have 
one available, which explains what you are seeing.

TSV format is more of a summary format. The XML output is the richest output 
format, and contains more information - including the signature accession, name 
(where available) and description (where available). Therefore if the 
SUPERFAMILY signature name was of interest you could consider using the XML 
output instead.

https://code.google.com/p/interproscan/wiki/OutputFormats

So I'm not sure of your use case, but does this help you proceed?

Regards,

Matthew

Original comment by Mr.Matth...@gmail.com on 11 Aug 2014 at 12:40

GoogleCodeExporter commented 9 years ago

Original comment by Maxim.Sc...@gmail.com on 30 Jan 2015 at 2:25