mpievolbio-scicomp / rarefan

http://rarefan.evolbio.mpg.de
MIT License
1 stars 0 forks source link

GFF3 support #56

Closed CFGrote closed 1 year ago

CFGrote commented 1 year ago

This PR adds the new functionality that identified repin and rayt features are reported in a gff3 annotation file.

CFGrote commented 1 year ago

@fredeBio, the generated gff3 files are not valid. I used http://genometools.org/cgi-bin/gff3validator.cgi for validation.

See also https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md for the gff3 specs.

Attached files were produced by running RAREFAN on the Nmen_2592.fas sequence.

Nmen_2594_0_3.gff3 rayt_Nmen_2594.gff3

You should run git pull --rebase in your branch before making any changes. Then, push to the same branch as before and the changes will appear here.

CFGrote commented 1 year ago

GFF3 validation passes now, great! I see some issues in the python code due to changes in filename patterns. I'll address those. I'd also like to add a test that validates the gff3 file from a test run.

CFGrote commented 1 year ago

Had to remove one check from the test script: Previously, it checked that there are 48 REPINs found, but this seems to be not true for the test case (small Neisseria test dataset with NMAA_0237.faa as query rayt (see .github/workflows/ubuntu_lastest.yml) . Should check at some point what exactly.