Open apcamargo opened 2 years ago
Sorry for the late response here, I've only just had some time to revisit this project. The strand information is available in genbank files.
5' to 3' gene:
gene 9762..10592
3' to 5' gene:
gene complement(9762..10592)
The programs that run from the genome FASTA-format files generally create their own annotations and will have strand info available (and if not it's their own fault).
Gene strand can be very useful to detect prophages, but it is currently lacking from the
.gb
files. Because of that, there's no way to benchmark a tool that leverages strandness using proteins/ORFs extracted from this dataset's.gb
files (usinggenbank2sequences.py
, for example).