unipept / FragGeneScanRs

Better and faster Rust implementation of the FragGeneScan gene prediction model for short and error-prone reads.
GNU General Public License v3.0
24 stars 1 forks source link

gff format output #2

Closed jianshu93 closed 3 years ago

jianshu93 commented 3 years ago

Hello FragGeneScanRs group,

I do not see gff format output as the original one does and also FGS+. This could be quite useful for quantification of reads mapped to genes in a lot of applications.

thanks,

Jianshu

ninewise commented 3 years ago

Since the gff format is simply a reformat of the "out" file, only generated in the perl wrapper around FGS(+), it was not included yet. You could create it with following awk script (which is the same as used in the FGS perl wrapper, line 76) called with the out file as input.

BEGIN { print "##gff-version 3"; }
{
    s = substr($1, 1, 1)
    if (s == ">") {
        seqid = substr($1, 2)
    } else {
        s = split($0, t, "\t")
        id = "ID=" seqid "_" t[1] "_" t[2] "_" t[3] ";product=predicted protein"
        print seqid "\tFGS\tCDS\t" t[1] "\t" t[2] "\t.\t" t[3] "\t" int(t[4] - 1) "\t" id
    }
}

It would be easy to add this as additional (optional) output format to the main code as well. I'll discuss this with the rest of the team tomorrow.

pdawyndt commented 3 years ago

👍 to support gff natively in FGSrs

ninewise commented 3 years ago

The FragGeneScanRs command will (once the linked PR is merged and released) output the GFF file by default when using the -o option, similar to running the perl wrapper of FGS(+). Alternatively, if you need only the gff file, you can run:

FragGeneScan -t 454_10 -s example/NC_000913-454.fna -g example/NC_000913-454.gff -w 0