ncbi / vadr

Viral Annotation DefineR: classification and annotation of viral sequences based on RefSeq annotation
Other
99 stars 23 forks source link

Understanding `misc_feature` with no qualifiers #51

Closed taltman closed 2 years ago

taltman commented 2 years ago

In the previous issue that I filed about double-entries, I noticed the following at the top of the fail.tbl file:

Feature NODE_1_length_19663_cov_257.252269
<1      12777   gene
                        gene    ORF1ab
<1      5687    misc_feature
5687    12777
                        note    similar to ORF1ab polyprotein
<1      5694    misc_feature
                        note    similar to ORF1a polyprotein

My interpretation of these lines is as follows:

I don't understand how to interpret the misc_feature from <1 to 5687, as it has no note qualifier that might explain what it is.

nawrockie commented 2 years ago

The first misc_feature is actually in two 'segments', first segment is <1..5687 and second is 5687..12777 (5687 is repeated as part of the programmed ribosomal frameshift). So the first note pertains to the first misc_feature and the second note pertains to the second misc_feature.

More on the format of the .tbl files is here: https://www.ncbi.nlm.nih.gov/genbank/feature_table/

taltman commented 2 years ago

Hi @nawrockie,

Thanks for explaining how this is the notation for multiple segments. I did look at that NCBI documentation page (as you kindly link to from the VADR docs), but I didn't see any discussion of this notation. Please let me know if I overlooked something.

nawrockie commented 2 years ago

@taltman I think the relevant line from https://www.ncbi.nlm.nih.gov/genbank/feature_table/ is:

If a feature contains multiple intervals, like the spliced tRNA-Phe or the Yip2p CDS, each interval is listed on a separate line by its start and stop position before subsequent qualifier lines

What I referred to as 'segments' are referred to above as 'intervals'. (I use 'segments' in the vadr code and documentation.)

taltman commented 2 years ago

Thanks @nawrockie, that is very helpful. Also, for anyone else who is trying to master the feature table, this doc is more exhaustive: https://www.insdc.org/documents/feature_table.html