Open mmokrejs opened 7 months ago
BTW, I had to add Name=S
into the 9th column of the CDS
line to make gofasta
happy. Here is from your docs:
_For the purposes of annotating amino acids, CDS or mature_protein_region_ofCDS feature lines that have a Name=something tag,value pair in the attributes column (column 9) will be represented in the output.
Why doesn't it infer the Name=
from the Parent=
?
Can you provid input files for the first comment in this issue as attachments please, so that it is possible to replicate the error?
The help for gofasta variants
and gofasta sam variants
states:
-a, --annotation string Genbank or GFF3 format annotation file. Must have suffix .gb or .gff
I'll think about changing the file extension check, but it's more parsimonious for the user just to change their filename.
I'm not sure what you mean by "to make gofasta happy", but the purpose of the Name=something
parsing convention was so that things with a "Name=x" tab would be reported in the output, whereas things without wouldn't be. In this way is is possible to annotate protein coding features in your gff for the purposes of defining non-protein-coding (intronic, intergenic, synonymous) nucleotide changes, without having to also have every amino acid change in every gene in the output.
This usage was intended to be somewhat coherent with the gff version 3 specifications.
Does this also make it clearer why Name is not inherited from a Parent feature? (which isn't coherent with the gff spec, I don't think).
- Can you provid input files for the first comment in this issue as attachments please, so that it is possible to replicate the error?
testcases.aln.txt
testcases.sam.txt
7-WU-FF1.fasta.txt or replace N
s with say a
for https://github.com/virus-evolution/gofasta/issues/46
7-WU-FF.gff.txt
I'll think about changing the file extension check, but it's more parsimonious for the user just to change their filename.
Other tools happily accept .gff3
so that was as exactly why I wanted to keep the filename as it is and avoid an extra symlink or renaming.
* I'm not sure what you mean by "to make gofasta happy", but the purpose of the `Name=something` parsing convention was so that things with a "Name=x" tab would be reported in the output, whereas things without wouldn't be. In this way is is possible to annotate protein coding features in your gff for the purposes of defining non-protein-coding (intronic, intergenic, synonymous) nucleotide changes, without having to also have every amino acid change in every gene in the output.
I write it now based on my memory but I think gofasta
skipped reporting the output if the Name=
was unset. So I had to edit my .gff
file to get it working.
This usage was intended to be somewhat coherent with the [gff version 3 specifications](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). Does this also make it clearer why Name is not inherited from a Parent feature? (which isn't coherent with the gff spec, I don't think).
I will study that later, cannot tell straight away.
Hi, I wonder why
-
are not allowed in my reference. I had to use_
to overcome this.Please improve the error message to make it clear what is source of the error. The reference used to create the SAM file used
7-WU-FF
and likewise thegff3
contains7-WU-FF
.The
7-WU-FF1.gff
file contains:BTW, I had to rename my file from
.gff3
to.gff
to get rid of:Please relax the check if this is just about the filename extension match. Thank you.