linsalrob / ProphagePredictionComparisons

Comparisons of multiple different prophage predictions
MIT License
23 stars 12 forks source link

[Suggestion] Add strand information to the gbk files #10

Open apcamargo opened 2 years ago

apcamargo commented 2 years ago

Gene strand can be very useful to detect prophages, but it is currently lacking from the .gb files. Because of that, there's no way to benchmark a tool that leverages strandness using proteins/ORFs extracted from this dataset's .gb files (using genbank2sequences.py, for example).

beardymcjohnface commented 2 years ago

Sorry for the late response here, I've only just had some time to revisit this project. The strand information is available in genbank files.

5' to 3' gene:

gene            9762..10592

3' to 5' gene:

gene            complement(9762..10592)

The programs that run from the genome FASTA-format files generally create their own annotations and will have strand info available (and if not it's their own fault).