gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
199 stars 32 forks source link

-G option no longer exists #4

Closed Jeltje closed 7 years ago

Jeltje commented 8 years ago

This option is listed both in the current README.md as an example, as well as in Box 1 of the Nature Protocols paper (though with different meanings, it seems). However, gffcompare v0.9.7 does not list -G in the usage statement and does not accept it as input.

An explanation (and/or alternative) in the README would be appreciated!

Also (off topic) it's not instantly clear what's in the 6 different output files. A quick primer would save a new user a lot of time.

Apart from that - where has this tool been all my life?! So useful, thanks!

gpertea commented 8 years ago

You are correct, this was a recent change which has not been documented. I decided to make -G the default behavior so I removed the option (while adding -E as a new option to invoke the old default behavior, i.e. when -G was not provided). So, in short, -G is no longer needed as it's the default behavior, while -E was added as the opposite of -G (invoking the old default behavior). Thank you for bringing this up, I should update the README to explain that by default gffcompare does not merge "contained" isoforms, even though they appear intron compatible and "redundant". Merging such isoforms is the default behavior of Cuffcompare and thus older versions of gffcompare, and it was based on the fact that the transfrags produced by Cufflinks (at least back in the day) was not supposed to produce such "redundant" isoforms. However meanwhile the concept of alternative TSS had made us reconsider this, so I had to account for the possibility that "shorter" isoforms, although contained and intron compatible in a "larger" isoform may have their own biological identity and thus their own expression level, etc. (so they should no longer be considered "redundant" as some sort of assembly artifact).

As for the output files, they are somewhat documented in the cuffcompare documentation which can be found on the Cufflinks (Tuxedo suite) web pages: http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/ As you probably noticed, gffcompare is essentially the same program as cuffcompare, which I no longer actively maintain (except for backporting bug fixes or adding Cufflinks-related features, if requested). So I had to change its name as to make it more generic and continue adding more features while de-coupling it from the Cufflinks' code base (which can be a pain to build). I guess I should add a section in the gffcompare's README briefly documenting the output files at least.

Jeltje commented 8 years ago

Thanks for the detailed explanations! I never used cuffcompare but I now see that I should have. The descriptions on that page are completely clear.

gpertea commented 7 years ago

Closing this old issue. The -G option is currently accepted (i.e. no error message) but silently ignored, as it is the default behavior. The -E option has been removed, and only full duplicates will be discarded from the input file automatically (and this cannot be changed). The -C option should be used if "contained" transfrags were of no interest to the analysis.