nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
409 stars 415 forks source link

Sentieon bundle file #1399

Open gianfilippo opened 9 months ago

gianfilippo commented 9 months ago

Description of the bug

Hi, I am unable to load sentieon dnascope model using the command below. The SENTIEONDNASCOPEMODEL points to to location of the Sentieon model $PATH/DNAscopeIlluminaWES2.0.bundle/dnascope.model

I think. the bunde file is not recognised by the installed version of Sentieon

Command used and terminal output

nextflow run nf-core/sarek \
    -r 3.4.0 \
    -c "$inDIR/my.config" \
    -profile apptainer \
    -work-dir $workDIR \
    --input $inDIR/samplesheet.csv \
    --outdir $outPATH/$outDIR \
    --wes \
    --intervals $WESINTERVAL \
    --tools sentieon_dnascope,sentieon_dedup,mutect2,snpeff,vep,merge \
    --trim_fastq True \
    --aligner sentieon-bwamem \
    --joint_germline True \
    --joint_mutect2 True \
    --igenomes_ignore \
    --genome null \
    --fasta $GENOME \
    --fasta_fai $GENOMEIDX \
    --sentieon_dnascope_emit_mode variant,gvcf \
    --sentieon_dnascope_pcr_indel_model CONSERVATIVE \
    --dbsnp $DBSNP \
    --dbsnp_tbi $DBSNPTBI \
    --known_snps $SNPS \
    --known_snps_tbi $SNPSTBI \
    --known_snps_vqsr $SNPSVSQ \
    --known_indels $INDELS \
    --known_indels_tbi $INDELSTBI \
    --germline_resource $GERMLINERES \
    --germline_resource_tbi $GERMLINERESTBI \
    --sentieon_dnascope_model $SENTIEONDNASCOPEMODEL \
    --pon $PON \
    --pon_tbi $PONTBI \
    --snpeff_db 105 \
    --snpeff_genome 'GRCh38' \
    --vep_cache_version 110 \
    --vep_genome 'GRCh38' \
    --vep_species 'homo_sapiens' -resume

Relevant files

No response

System information

Nextflow/23.04.2 HPC Slurm or local Apptainer nf-core/sarek 3.4.0

asp8200 commented 9 months ago

What error are you getting? Can you paste the error message here?

Did you set the nextflow secret for sentieon?

Sarek v3.4 runs Sentieon:202112.06. However, the version of Sarek on the dev-branch runs Sentieon:202308.01. Perhaps you could try your nf-command on the dev-branch?

I see that Sentieon just released this new model:

Illumina whole genome

Tomorrow I'll check if that model plays nicely with Sarek v3.4.

gianfilippo commented 9 months ago

Hi, thanks for the prompt reply!!

I did set the nextflow secret for sentieon. I have a trial license.

I will try the dev-branch.

I forgot to post the error message, sorry about it. It is now attached below

Best

====================================================================== Feb-08 16:56:17.850 [main] ERROR nextflow.cli.Launcher - Unable to read script: '$HOME/.nextflow/assets/nf-core/sarek/./workflows/sarek.nf' -- cause: $PATH/Sentieon/sentieon-genomics-202112.07/models/DNAscopeIlluminaWES2.0.bundle/dnascope.model nextflow.exception.ScriptCompilationException: Unable to read script: '$HOME/.nextflow/assets/nf-core/sarek/./workflows/sarek.nf' -- cause: $PATH/Sentieon/sentieon-genomics-202112.07/models/DNAscopeIlluminaWES2.0.bundle/dnascope.model at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:208) at nextflow.script.IncludeDef.memoizedMethodPriv$loadModule0PathMapSession(IncludeDef.groovy:151) at nextflow.script.IncludeDef.access$0(IncludeDef.groovy) at nextflow.script.IncludeDef$clinitclosure2.doCall(IncludeDef.groovy) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.java:137) at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:137) at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:113) at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136) at nextflow.script.IncludeDef.loadModule0(IncludeDef.groovy) at nextflow.script.IncludeDef.load0(IncludeDef.groovy:123) at nextflow.script.IncludeDef$load0$1.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at Script_615ff6f3.runScript(Script_615ff6f3:103) at nextflow.script.BaseScript.run0(BaseScript.groovy:145) at nextflow.script.BaseScript.run(BaseScript.groovy:192) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229) at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:224) at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:130) at nextflow.cli.CmdRun.run(CmdRun.groovy:368) at nextflow.cli.Launcher.run(Launcher.groovy:494) at nextflow.cli.Launcher.main(Launcher.groovy:653) Caused by: java.nio.file.NoSuchFileException: $PATH/Sentieon/sentieon-genomics-202112.07/models/DNAscopeIlluminaWES2.0.bundle/dnascope.model at nextflow.file.FileHelper.checkIfExists(FileHelper.groovy:1068) at nextflow.file.FileHelper$checkIfExists$2.call(Unknown Source) at nextflow.Nextflow.file(Nextflow.groovy:112) at nextflow.Nextflow$file$0.callStatic(Unknown Source) at Script_f43b7d6c.runScript(Script_f43b7d6c:77) at nextflow.script.BaseScript.run0(BaseScript.groovy:145) at nextflow.script.BaseScript.run(BaseScript.groovy:192) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:215) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:205) ... 31 common frames omitted

gianfilippo commented 9 months ago

Hi, I just tried the nf-command on the dev-branch and the error is the same.

asp8200 commented 9 months ago

Turns out the new .bundle-format was introduced as part of Sentieon 202308. Sarek v3.4 uses an earlier version of Sentieon, that is, 202112.06. Now, the current dev-version of Sarek implements Sentieon 202308.01, but - nevertheless - the .bundle-format is not supported there. We will try to add support for that as well.

If you're in a setup where you can "locally" edit the source code for the dev version of Sarek, then it is possible to simply hardcode the path to the dnascope-model in the nf-module for dnascope, that is, replace this line

https://github.com/nf-core/sarek/blob/8fba43d7eef4a2e8e80dd3ea682f8249a2ccd7fb/modules/nf-core/sentieon/dnascope/main.nf#L49

with something like

 def model_cmd = " --model /path/to/DNAscopeIlluminaWGS2.0.bundle/dnascope.model"

Before running that with you're long nextflow command, I suggest testing it with a shorter test command like, say,

nextflow run main.nf -profile test,singularity --outdir foo --tools sentieon_dnascope --sentieon_dnascope_emit_mode variant --skip_tools haplotyper_filter

You're welcome to join us on the nf-core/sarek slack channel. Sometimes it is faster and easier to get help there.

FYI: The problem for Nextflow and Sarek here is that /path/to/DNAscopeIlluminaWGS2.0.bundle/dnascope.model is not an actual fille on the disk.

P.S. I did't know about the new .bundle-format, so thanks for making me aware of that.

gianfilippo commented 9 months ago

Hi, thanks. This is very helpful. I will try to modify the code as you suggest and will let you know. Thanks you for developing the pipeline. This is great work!

gianfilippo commented 9 months ago

Hi, it looks like I also have a licence issue. I run nextflow secrets set SENTIEON_LICENSE_BASE64 $(cat $LICPATH/sentieon_license_file.lic | base64 -w 0) I also check SENTIEON_LICENSE_BASE64 by nextflow secrets get SENTIEON_LICENSE_BASE64 which results in a very long string Yet, when I run nextflow run nf-core/sarek -r dev -profile test,apptainer --outdir foo --tools sentieon_dnascope --sentieon_dnascope_emit_mode variant --skip_tools haplotyper_filter

I am getting the following error -[nf-core/sarek] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_DNASCOPE:SENTIEON_DNASCOPE (test)'

Caused by: Process NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_DNASCOPE:SENTIEON_DNASCOPE (test) terminated with an error exit status (2)

Command executed:

if [ "${#SENTIEON_LICENSE_BASE64}" -lt "1500" ]; then # If the string SENTIEON_LICENSE_BASE64 is short, then it is an encrypted url. export SENTIEON_LICENSE=$(echo -e "$SENTIEON_LICENSE_BASE64" | base64 -d) else # Localhost license file

The license file is stored as a nextflow variable like, for instance, this:

  # nextflow secrets set SENTIEON_LICENSE_BASE64 $(cat <sentieon_license_file.lic> | base64 -w 0)
  export SENTIEON_LICENSE=$(mktemp)
  echo -e "$SENTIEON_LICENSE_BASE64" | base64 -d > $SENTIEON_LICENSE

fi

if [ ] && [ ]; then

If sentieon_auth_mech_base64 and sentieon_auth_data_base64 are non-empty strings, then Sentieon is mostly likely being run with some test-license.

  export SENTIEON_AUTH_MECH=$(echo -n "" | base64 -d)
  export SENTIEON_AUTH_DATA=$(echo -n "" | base64 -d)
  echo "Decoded and exported Sentieon test-license system environment variables"

fi

sentieon driver -r genome.fasta -t 2 -i test.recal.cram --interval chr22_1-40001.bed --algo DNAscope -d dbsnp_146.hg38.vcf.gz --model /path/to/DNAscopeIlluminaWGS2.0.bundle/dnascope.model --pcr_indel_model CONSERVATIVE --emit_mode variant test.dnascope.unfiltered.vcf.gz

cat <<-END_VERSIONS > versions.yml "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_DNASCOPE:SENTIEON_DNASCOPE": sentieon: $(echo $(sentieon driver --version 2>&1) | sed -e "s/sentieon-genomics-//g") END_VERSIONS

Command exit status: 2

Command output: (empty)

Command error: /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory /usr/local/share/sentieon-202308.01-0/libexec/licsrvr: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory License server could not be started Please check /tmp/licsrvr.log for more information

Work dir: $PATH/NEXTFLOW/work/fc/ef22de40e40ba36d3b4cd2a4067d99

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details

asp8200 commented 9 months ago

That is a bug :-/ Thanks for uncovering this. It is due to the fact that your command uses Apptainer. We got a similar error with Singularity, and so we had to introduce this kind of hack in the sentieon-modules:

https://github.com/nf-core/sarek/blob/8fba43d7eef4a2e8e80dd3ea682f8249a2ccd7fb/modules/nf-core/sentieon/dnascope/main.nf#L38-L42

I must admit I haven't tested dev-Sarek with Apptainer. If possible, could you - in your "local" code, change this line

https://github.com/nf-core/sarek/blob/8fba43d7eef4a2e8e80dd3ea682f8249a2ccd7fb/modules/nf-core/sentieon/dnascope/main.nf#L38

to

if ((workflow.containerEngine == 'singularity') || (workflow.containerEngine == 'apptainer')) {

I was able to reproduce the above-mentioned error withapptainer and the fix worked for me. Hope it also works for you. Cheers

P.S. Just to check that your sentieon license is ok, can you run that sentieon-command directly on the command-line? Having activated Sentieon with license, you should be able to navigate to the work-folder $PATH/NEXTFLOW/work/fc/ef22de40e40ba36d3b4cd2a4067d99, create the needed input-files for the sentieon-command (Look for ln-commands in staging section of .command.run and run those directly in the terminal) and then run the sentieon command.

maxulysse commented 9 months ago

Sounds like a good fix.

Shall we patch all modules then?

asp8200 commented 9 months ago

@maxulysse :The bug related to apptainer is an easy fix (as I'm sure you'll agree), but we need to fix it in the modules-repo. AFAIK, the CI-tests don't run apptainer, but we still need to run the tests for docker and that require the license-server to be up and running. (Also, we may have to re-write the CI-tests to nf-tests; not sure about that though.)

Any idea on how we add support for the new .bundle-files? The problem is that Sentieon needs the model specified in an option as /path/to/<some_bundle>/<some_name>.model but /path/to/<foo>.bundle/<bar>.model is not a file while /path/to/<foo>.bundle is a file. One way of handling it would be to introduce extra parameters, like --sentieon_dnascope_bundle and --sentieon_dnascope_model_in_bundle. Terrible parameter names no doubt, but you get the (bad) idea. One could also pass in the whole string /path/to/<foo>.bundle/<bar>.model in one new option and then due some string manipulation to separate the path to the bundle and the name of the model, but that doesn't seem like a very robust solution.

maxulysse commented 9 months ago

The bundle file will be solved separately

asp8200 commented 9 months ago

I don't suppose the CI-tests of Sentieon will pass:

https://github.com/nf-core/modules/pull/4893

FriederikeHanssen commented 9 months ago

I don't suppose the CI-tests of Sentieon will pass:

nf-core/modules#4893

Server is still down afaik. If @maxulysse agrees would be ok for me to run all tests locally on your side for now.

maxulysse commented 9 months ago

Ok for me for now

asp8200 commented 9 months ago

I ran the pytests for Sentieon locally with singularity:

TMPDIR=/ngc/projects/bio/people/andped/tmp  PROFILE=singularity pytest --tag sentieon  --symlink --kwdof --color=yes
...
=== 270 passed, 1424 skipped, 1 warning in 1303.74s (0:21:43) ===

I can't run the tests with Docker on NGC's dev-server, and that is where I have the prod-license for Sentieon and internet access.

The pytests are not setup to work with Apptainer, but I tried adding the following snippet

else if ("$PROFILE" == "apptainer") {
    apptainer.enabled      = true
    apptainer.autoMounts   = true
    charliecloud.enabled   = false
    conda.enabled          = false
    docker.enabled         = false
    podman.enabled         = false
    shifter.enabled        = false
    singularity.enabled    = false } 

around here. However, I just got the following error messaage when trying to run the following pytest with Apptainer:

TMPDIR=/ngc/projects/bio/people/andped/tmp  PROFILE=apptainer pytest --tag sentieon/dnascope  --symlink --kwdof --color=yes

Error msg:

apptainer pull  --name biocontainers-sentieon-202308.01--h43eeafb_0.img.pulling.1707583786207 docker://biocontainers/sentieon:202308.01--h43eeafb_0
INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
FATAL:   While making image from oci registry: error fetching image to cache: failed to get checksum for docker://biocontainers/sentieon:202308.01--h43eeafb_0: reading manifest 202308.01--h43eeafb_0 in docker.io/biocontainers/sentieon: requested access to the resource is denied

Anyways, I did the same change in the dnascope module locally in the dev-Sarek-pipeline, and was able to run the follow test-command which uses Apptainer:

nextflow run main.nf -profile test_cache,targeted,apptainer --input ./tests/csv/3.0/mapped_single_bam.csv --tools sentieon_dnascope --step variant_calling --skip_tools dnascope_filter --outdir results 

N.B. Does one of you have rights to merge https://github.com/nf-core/modules/pull/4893 with the CI-tests passing?

gianfilippo commented 9 months ago

Hi,

I updated to tehn latest dev version and it seems ok now.

I am assuming I should not use the "--sentieon_dnascope_model" option as this is hard coded for the moment. Correct ?

Thanks

asp8200 commented 9 months ago

Hi,

I updated to tehn latest dev version and it seems ok now.

I am assuming I should not use the "--sentieon_dnascope_model" option as this is hard coded for the moment. Correct ?

Thanks

No, the hardcoding of the path to the dnascope-model is not on the dev-branch. We're currently working on a proper solution. If you need the dev-branch working with the bundle, then you - for the moment - still have to hardcode the path to the dnascope-model in the dnascope-module - like mentioned above. Sorry for the inconvenience.