Closed tsackton closed 2 years ago
Test data has been added in the test-data folder on the develop branch. This includes mouse (mm10) chromosome 19 sequence and annotation files from three different sources: UCSC, NCBI, and ENSEMBL. The current version of the code has been tested and runs on the NCBI gff file with the following command:
python degenotate.py -a test-data/mm10/ncbi/GCF_000001635.27_GRCm39_NC_000085.7.gff.gz -g test-data/mm10/ncbi/GCF_000001635.27_GRCm39_NC_000085.7.fna.gz -o test-out
At some point we may want to add another chromosome or two, but this should be ok for now.
It would be great to find a small, fast test dataset we can use to make sure code is working, to do some casual benchmarking of various potential approaches, and to eventually form the basis for a test suite for unit tests or similar.