zanglab / SICER2

MIT License
20 stars 15 forks source link

Why are species and their chromosomes hardcoded ? #23

Open vidal-adrien opened 1 year ago

vidal-adrien commented 1 year ago

Rather than having a species argument referring to genome data in code, why not let this argument take a two column file (chromosone name, chromosome length) and parse it to let users process data of any genome.

In its current state the tool is limited to the set of species defined in GenomeData.py, making editing the source code necessary to treat any other species.

Cordially, Adrien V.

kelly-sovacool commented 1 year ago

Perhaps parsing a JSON file into a python dictionary in the same format as those in GenomeData.py would allow a simple way to provide a custom genome.

vidal-adrien commented 1 year ago

Even Json seems needlessly complicated. It's only two variables: chromosome name and length. 1 column less than a bed file. These are the only variables that are retrieved from the species argument from what i can tell.

A parser that reads the first two columns of a file would even be able to use samtools fasta indexes as an input.

Even easier for the user would be to just have the fasta be the input file and parse it to get chromosome name and length.

The preset species could also be kept for compatibility. 1 file each in a folder in the library and then look first in this folder for speciesName.tab and if not found then treat it as a path to a custom species table.

vidal-adrien commented 12 months ago

In case anybody wants that feature, this project does it and more: https://github.com/biocore-ntnu/epic2