tomasbruna / miniprot-boundary-scorer

Miniprot boundary scorer parses introns, starts, stops and exons from miniprot's alignment output and scores them based on local alignment quality
Other
1 stars 0 forks source link

bioconda recipe #4

Open yjk-bertrand opened 3 months ago

yjk-bertrand commented 3 months ago

Hello, Thank you for providing miniprot-boundary-scorer to the community, the tool is really nice. We wish to include it in our snakemake pipeline. Having to compile a software means that we need to create a container, which is not part of our initial plan. Would it be possible to make a bioconda recipe? Cheers, Yann

tomasbruna commented 3 months ago

Hi Yann,

Thanks for your interest. miniprot-boundary-scorer is already available as part of the GALBA container image. See here https://github.com/Gaius-Augustus/GALBA?tab=readme-ov-file#singularity-image, you can invoke it with

singularity exec galba.sif miniprot_boundary_scorer

Would using it like that work for you?

If not, I'll look into making a bioconda recipe.

Best, Tomas

yjk-bertrand commented 3 months ago

Hi Tomáš, Thanks for your prompt answer. Since you are proposing I will heartfully take your offer to make a bioconda recipe. In the context of our Snakemake pipeline it is certainly not convenient to pull that docker image everywhere it needs to run. Cheers, Yann

KatharinaHoff commented 3 months ago

It is super easy to pull and run containers in snakemake. I do it all the time. Writing from my phone, I can send you an example rule from my computer next week. Really easy!

yjk-bertrand @.***> schrieb am Fr. 16. Aug. 2024 um 10:02:

Hi Tomáš, Thanks for your prompt answer. Since you are proposing I will heartfully take your offer to make a bioconda recipe. In the context of our Snakemake pipeline it is certainly not convenient to pull that docker image everywhere it needs to run. Cheers, Yann

— Reply to this email directly, view it on GitHub https://github.com/tomasbruna/miniprot-boundary-scorer/issues/4#issuecomment-2293025715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBK2BL73GO55OPCXSTZRWWXZAVCNFSM6AAAAABMRYZEDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJTGAZDKNZRGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yjk-bertrand commented 3 months ago

In that case, we can give it a try. Looking forwards to your tips. Thanks!

KatharinaHoff commented 3 months ago

This is a random rule from one of my snakefiles. You see how I specify the container source under "singularity". Everything executed in the shell will be run with that container. miniprot-boundary-scorer resides in the GALBA container, not in the braker container as shown here. (This particular rule requires snakemake-executor-plugin-slurm because I execute it via SLURM, but you may not need that part.)

rule run_sam_to_bam:
    input:
        fastqdump_lst = "data/checkpoints_dataprep/{taxon}_B06_rnaseq_for_fastqdump.lst",
        remove_bad_done = "data/checkpoints_dataprep/{taxon}_B06_remove_bad_libraries.done"
    output:
        done = "data/checkpoints_dataprep/{taxon}_B06_sam2bam.done"
    params:
        taxon=lambda wildcards: wildcards.taxon,
        threads = config['SLURM_ARGS']['cpus_per_task']
    wildcard_constraints:
        taxon="[^_]+"
    singularity:
        "docker://teambraker/braker3:latest"
    threads: int(config['SLURM_ARGS']['cpus_per_task'])
    resources:
        mem_mb=int(config['SLURM_ARGS']['mem_of_node']),
        runtime=int(config['SLURM_ARGS']['max_runtime'])
    shell:
        """
        export APPTAINER_BIND="${{PWD}}:${{PWD}}"; \
        log="data/checkpoints_dataprep/{params.taxon}_B06_sam2bam.log"
        echo "" > $log
        readarray -t lines < <(cat {input.fastqdump_lst})
        for line in "${{lines[@]}}"; do
            # Replace the first space with an underscore in the species name part of the line
            modified_line=$(echo "$line" | sed 's/\\([^\\t]*\\) /\\1_/')
            species=$(echo "$modified_line" | cut -f1)
            sra_ids=$(echo "$modified_line" | cut -f2)
            IFS=',' read -r -a sra_array <<< "$sra_ids"
            for sra_id in "${{sra_array[@]}}"; do
                if [ ! -f "data/species/$species/hisat2/${{sra_id}}.bam" ] && [ -f data/species/$species/hisat2/${{sra_id}}.sam ]; then
                    echo "samtools view --threads {params.threads} -bS data/species/$species/hisat2/${{sra_id}}.sam > data/species/$species/hisat2/${{sra_id}}.bam" &>> $log
                    samtools view --threads {params.threads} -bS data/species/$species/hisat2/${{sra_id}}.sam > data/species/$species/hisat2/${{sra_id}}.bam 2>> $log
                else
                    echo "data/species/$species/hisat2/${{sra_id}}.bam already exists" &>> $log
                fi
            done
        done
        touch {output.done}
        """

Important is that singularity needs bindings to access your data. I configure the bindings like this:

Create a file: ~/profile/apptainer/config.v8+.yaml

Add the following content to the file (adapt to your own working directory):

use-singularity: True
singularity-args: "\"--bind /home/xy/git/braker-snake:/home/xy/git/braker-snake --bind /home/xy/ncbi:/home/xy/ncbi\""

You have to adapt to your own directories, of course.

To run the snakefile in the end, include the option --use-apptainer.

KatharinaHoff commented 3 months ago

Additional information: when you execute any snakemake workflow like this, the container pulling will take time at the first run. But for follow-ups, it will re-use the already pulled containers.

sivico26 commented 3 months ago

Hi @KatharinaHoff and @tomasbruna,

I am Simón and I work with Yann in the development of the pipeline.

I appreciate the tips about singularity usage. They may come in handy in the future.

Our pipeline currently relies only on pip (and python packages) and conda/mamba as dependencies. I would be reticent to add singularity as an extra dependency if we can avoid it. I prefer to put a bit of extra effort into the developer's shoulders to make it easier for the user.

Making a bioconda recipe is just as easy. I am willing to help to set it up if you prefer. From what I can see from the guidelines (which complement the instructions of the previous link), the really important thing we are missing is a stable URL (i.e. a tarball).

@tomasbruna, could you please make a GitHub release of miniprot-boundary-scorer?

The second thing that is not clear to me are the dependencies. Is it just make and the C++ compatible compiler? or are there other libraries needed that I missed?

Thank you for the help.

tomasbruna commented 3 months ago

Hi @sivico26,

I've added the release. I've removed the test folder from the release tarball, so it's pretty compact.

If you could set the recipe with that, it would be great.

Is it just make and the C++ compatible compiler?

Correct, there are no other dependencies.

sivico26 commented 3 months ago

Happy to inform you that the PR for the package I made to bioconda was merged earlier today. Hence, I think the package for miniprot-boundary-corer will be available soon in bioconda.

tomasbruna commented 3 months ago

Very cool, thanks!