refgenie / refgenconf

A Python object for standardized reference genome assets.
http://refgenie.databio.org
BSD 2-Clause "Simplified" License
3 stars 6 forks source link

Populate of refgenie registry paths doesn't allow dashes in the genome name #141

Closed mirpedrol closed 1 year ago

mirpedrol commented 2 years ago

Hi!

In the server of reference genomes that we use in nf-core our genome names contain dashes (e.g. sus_scrofa-ucsc-susscr2). The regular expression that is used to extract those names for populate of Refgenie registry paths does not allow dashes, making the command fail.

Would it be possible to allow dashes in the genome names?

Thank you!

test_igenomes.config.tpl

params {
    genomes {
        'homo_sapiens-ucsc-hg38' {
            fasta       = "refgenie://homo_sapiens-ucsc-hg38/fasta"
            bwa         = "refgenie://homo_sapiens-ucsc-hg38/bwa_index:0.7.17"
        }
        'homo_sapiens-ensembl-grch37' {
            fasta       = "refgenie://homo_sapiens-ucsc-hg38/fasta"
            bwa         = "refgenie://homo_sapiens-ucsc-hg38/bwa_index:0.7.17"
        }

Command used: refgenie populater --file test_igenomes.config.tpl

Error message:

No local digest for genome alias: None
Setting 'None' identity with server: http://igenomes.databio.org/v3/genomes/genome_digest/None
Genome 'None' not available on any of the following servers: http://igenomes.databio.org
'refgenie://homo_sapiens' refgenie registry path not populated.  

Output:

params:
    genomes:
        'homo_sapiens-ucsc-hg38':
            fasta: "refgenie://homo_sapiens-ucsc-hg38/fasta:samtools-1.10"
            bwa: "refgenie://homo_sapiens-ucsc-hg38/bwa_index:default"
        'homo_sapiens-ensembl-grch37':
            fasta: "refgenie://homo_sapiens-ucsc-hg38/fasta:default"
            bwa: "refgenie://homo_sapiens-ucsc-hg38/bwa_index:default"%    

Changing the regex to refgenie://([A-Za-z0-9_\-/\.\:]+)? outputs the expected file populated with remote links:

params {
    genomes {
        'homo_sapiens-ucsc-hg38' {
            fasta       = "http://awspds.refgenie.databio.org/aws_igenomes/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882/fasta__samtools-1.10/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882.fa"
            bwa         = "http://awspds.refgenie.databio.org/aws_igenomes/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882/bwa_index__0.7.17/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882.fa"
        }
        'homo_sapiens-ensembl-grch37' {
            fasta       = "http://awspds.refgenie.databio.org/aws_igenomes/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882/fasta__samtools-1.10/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882.fa"
            bwa         = "http://awspds.refgenie.databio.org/aws_igenomes/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882/bwa_index__0.7.17/2c7b4118332e4dd51a9a5ee3ec8c41dc13bdd234bf2a3882.fa"
        }%     

This issue is realted to: https://github.com/nf-core/tools/issues/1084

mirpedrol commented 1 year ago

Closing as the PR fixing this was merged on dev branch from refgenconf