Open vidal-adrien opened 1 year ago
Perhaps parsing a JSON file into a python dictionary in the same format as those in GenomeData.py
would allow a simple way to provide a custom genome.
Even Json seems needlessly complicated. It's only two variables: chromosome name and length. 1 column less than a bed file. These are the only variables that are retrieved from the species argument from what i can tell.
A parser that reads the first two columns of a file would even be able to use samtools fasta indexes as an input.
Even easier for the user would be to just have the fasta be the input file and parse it to get chromosome name and length.
The preset species could also be kept for compatibility. 1 file each in a folder in the library and then look first in this folder for speciesName.tab and if not found then treat it as a path to a custom species table.
In case anybody wants that feature, this project does it and more: https://github.com/biocore-ntnu/epic2
Rather than having a species argument referring to genome data in code, why not let this argument take a two column file (chromosone name, chromosome length) and parse it to let users process data of any genome.
In its current state the tool is limited to the set of species defined in GenomeData.py, making editing the source code necessary to treat any other species.
Cordially, Adrien V.