Closed taltman closed 2 years ago
The first misc_feature
is actually in two 'segments', first segment is <1..5687 and second is 5687..12777 (5687 is repeated as part of the programmed ribosomal frameshift). So the first note pertains to the first misc_feature and the second note pertains to the second misc_feature.
More on the format of the .tbl files is here: https://www.ncbi.nlm.nih.gov/genbank/feature_table/
Hi @nawrockie,
Thanks for explaining how this is the notation for multiple segments. I did look at that NCBI documentation page (as you kindly link to from the VADR docs), but I didn't see any discussion of this notation. Please let me know if I overlooked something.
@taltman I think the relevant line from https://www.ncbi.nlm.nih.gov/genbank/feature_table/ is:
If a feature contains multiple intervals, like the spliced tRNA-Phe or the Yip2p CDS, each interval is listed on a separate line by its start and stop position before subsequent qualifier lines
What I referred to as 'segments' are referred to above as 'intervals'. (I use 'segments' in the vadr code and documentation.)
Thanks @nawrockie, that is very helpful. Also, for anyone else who is trying to master the feature table, this doc is more exhaustive: https://www.insdc.org/documents/feature_table.html
In the previous issue that I filed about double-entries, I noticed the following at the top of the
fail.tbl
file:My interpretation of these lines is as follows:
I don't understand how to interpret the
misc_feature
from <1 to 5687, as it has nonote
qualifier that might explain what it is.