Open fernanarr opened 2 years ago
Thanks for writing @fernanarr, sorry for the delay. I was able to fix the issue by using full paths for the --basecalling_dir
and --base_calling_summary_file
. Let me know if that fixes your issue, I'll update the docs as well.
Hi!
Thanks a lot for your answer. I should have used full paths from the very beginning as you suggested in comments.
Now, my run_nf.sh looks this way
NXF_VER=21.10.6 nextflow run /home/fer/Circuitseq/pipelines/CircuitSeq.nf \
--GPU ON \
-c /home/fer/Circuitseq/pipelines/nextflow.config \
-with-singularity /home/fer/Circuitseq/entorno_singularity/plasmidassembly.sif \
--samplesheet /home/fer/Circuitseq/example_data/example_samplesheet.tsv \
--use_existing_basecalls true \
--fast5 "none" \
--basecalling_dir /home/fer/Circuitseq/example_data/fastq/pass/ \
--base_calling_summary_file /home/fer/Circuitseq/example_data/fastq/sequencing_summary.txt \
--barcodes plasmidseq/barcodes/v2/ \
--barcode_kit "MY-CUSTOM-BARCODES" \
--guppy_model dna_r9.4.1_450bps_sup.cfg \
--medaka_model r941_min_sup_g507 \
--gpu_slot cuda:0 \
--barcode_min_score 65 \
--quality_control_processes true \
-resume
When I run it, I get this result:
N E X T F L O W ~ version 21.10.6
Launching `/home/fer/Circuitseq/pipelines/CircuitSeq.nf` [tender_visvesvaraya] - revision: 2a20792072
Methylation calling: false
Quality control output: true
Project : /home/fer/Circuitseq/pipelines
Git info: null - null [null]
Cmd line: nextflow run /home/fer/Circuitseq/pipelines/CircuitSeq.nf --GPU ON -c /home/fer/Circuitseq/pipelines/nextflow.config -with-singularity /home/fer/Circuitseq/entorno_singularity/plasmidassembly.sif --samplesheet /home/fer/Circuitseq/example_data/example_samplesheet.tsv --use_existing_basecalls true --fast5 none --basecalling_dir /home/fer/Circuitseq/example_data/fastq/pass/ --base_calling_summary_file /home/fer/Circuitseq/example_data/fastq/sequencing_summary.txt --barcodes plasmidseq/barcodes/v2/ --barcode_kit MY-CUSTOM-BARCODES --guppy_model dna_r9.4.1_450bps_sup.cfg --medaka_model r941_min_sup_g507 --gpu_slot 'cuda:0' --barcode_min_score 65 --quality_control_processes true -resume
Manifest's pipeline version: null
executor > local (1)
executor > local (3)
[- ] process > GuppyBaseCalling -
executor > local (3)
[- ] process > GuppyBaseCalling -
[- ] process > GuppyDemultiplex -
[35/3bcbe1] process > GuppyDemultiplexExisting [100%] 1 of 1, failed: 1 ✔
[- ] process > pycoQC -
[- ] process > AlignReadsPre -
[- ] process > Porechop -
[- ] process > AlignReadsPostLengthFilter -
[- ] process > FilterReads -
[- ] process > CanuCorrect -
[- ] process > Flye -
[- ] process > Miniasm -
[- ] process > ConvertGraph -
[- ] process > AssessAssemblyApproach -
[- ] process > MedakaConsensus -
[- ] process > LCPCorrectionFlye -
[- ] process > Rotate -
[- ] process > MedakaConsensusLCP -
[- ] process > MedakaPolish -
[- ] process > MedakaPolish2 -
[- ] process > Fast5Subset -
[- ] process > OGMethylationCalling -
[- ] process > ReferenceCopy -
[- ] process > AlignReads -
[- ] process > AlignReadsNanofilter -
[- ] process > AssessContamination -
[6b/ba568e] process > ContaminationAggregation [100%] 1 of 1 ✔
[- ] process > PlasmidComparison -
[90/fc98c4] process > PlasmidComparisonCollection [100%] 1 of 1 ✔
[35/3bcbe1] NOTE: Process `GuppyDemultiplexExisting` terminated with an error exit status (255) -- Error is ignored
In the results folder, I get two new folders: "aggregate_assembly_assessment" and "aggregate_contamination" but I can't open the files "all_nextpolish2.stats" and "all_contamination_stats.txt" (both with 1KB) inside of those folders.
Am I doing something wrong again? Sorry for my insistence, but I can't figure out what to do to solve this on my own.
Thanks again for your help
Hello @fernanarr two quick thoughts.
Hi Francesco,
Of course, here is my .nextflow.log file
I've cloned the code and downloaded the singularity in september the 7th, so both are quite recent.
Thanks to all for your help Regards
As a sanity check I just cloned the github again and set it up again. And it's running fine on my end.
I think there may be a singularity problem on your end. Based on your path I assume you're running this on a local machine? Have you run pipelines with Singularity successfully in the past?
What output do you get if you run:
singularity inspect /home/fer/Circuitseq/entorno_singularity/plasmidassembly.sif
Another possibility is a GPU problem, do you have GPUs that work with guppy? We currently have guppy_barcoder set up to also use GPUs for speed, I've never tried running it on a machine without GPUs.
-Francesco
Hi Francesco,
Our GPU works fine with Guppy. We have an Nvidia GTX 3070TI. We also have CUDA installed.
Running singularity inspect /home/fer/Circuitseq/entorno_singularity/plasmidassembly.sif
we get this output
maintainer: NVIDIA CORPORATION <cudatools@nvidia.com>
org.label-schema.build-arch: amd64
org.label-schema.build-date: Wednesday_7_September_2022_16:51:37_CEST
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: femiliani/circuitseq:allbarcodes
org.label-schema.usage.singularity.version: 3.10.0
We are running Ubuntu 20.04 in WSL2 and this is the first time that we try to run a pipeline with Singularity.
Thanks again for your help.
I'm running into this same error, and am similarly having difficulty troubleshooting it. I'm still not sure that my barcode specification is correct, so I tried to run guppy_basecaller
inside the Singularity container to figure out the predefined values. I got an error that makes me think that the CUDA libraries aren't included in the container:
$ singularity shell plasmidassembly.sif
Singularity plasmidassembly.sif:~> which guppy_barcoder
/ont/ont-guppy/bin//guppy_barcoder
Singularity plasmidassembly.sif:~> guppy_barcoder --help
guppy_barcoder: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
Singularity plasmidassembly.sif:~> guppy_basecaller --help
guppy_basecaller: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
I'm not terribly familiar with Singularity, so it's possible that my test is missing something. But I think this could explain the mystery failures.
$ singularity inspect plasmidassembly.sif
{
"org.label-schema.build-date": "Tuesday_10_January_2023_16:29:9_EST",
"org.label-schema.schema-version": "1.0",
"org.label-schema.usage.singularity.deffile.bootstrap": "docker",
"org.label-schema.usage.singularity.deffile.from": "aaronmck/plasmidassembly:1_0_1",
"org.label-schema.usage.singularity.version": "3.1.0"
}
Sorry for dropping the ball on this. @fernanarr Aaron recently update the singularity container, it might be worth trying the new one. I can't make promises but on my end i wasn't able to reproduce that error. So i really have no other explanation. I know of at least another person who used the latest container at another institution on their own system and had it work starting from the barcoding step.
@pdexheimer, I have just learned a neat little trick about singularity from Aaron. To let singularity use the computers gpus you have to add --nv
to the run command. e.g.:
without --nv i get the same error
(medaka) [f002sd4@barbara testing]$ singularity exec --bind /analysis/2022_11_23_circuitseq_debug/Circuitseq/example_data/fastq/:/mnt /analysis/2022_11_23_circuitseq_debug/plasmidassembly.sif guppy_barcoder -h | head
/ont/ont-guppy/bin/guppy_barcoder: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
with --nv error goes away
(medaka) [f002sd4@barbara testing]$ singularity exec --nv --bind /analysis/2022_11_23_circuitseq_debug/Circuitseq/example_data/fastq/:/mnt /analysis/2022_11_23_circuitseq_debug/plasmidassembly.sif guppy_barcoder -h | head
guppy_barcoder, part of Guppy basecalling suite, (C) Oxford Nanopore Technologies, Limited. Version 5.0.16+b9fcd7b
Usage:
guppy_barcoder -i <input fastq path> -s <save path>
With kit name:
guppy_barcoder -i <input fastq path> -s <save path> --barcode_kits <kit name>
List supported barcoding kits:
guppy_barcoder --print_kits
and just for good measure, here is guppy_barcoder working on our example data
(medaka) [f002sd4@barbara testing]$ singularity exec --nv --bind /analysis/2022_11_23_circuitseq_debug/Circuitseq/example_data/fastq/:/mnt /analysis/2022_11_23_circuitseq_debug/plasmidassembly.sif guppy_barcoder --input_path /mnt --save_path demux --data_path /plasmidseq/barcodes/v2/ --barcode_kits "MY-CUSTOM-BARCODES"
ONT Guppy barcoding software version 5.0.16+b9fcd7b
input path: /mnt
save path: demux
arrangement files: barcode_arrs_plasmid96.cfg
lamp arr. files:
min. score front: 60
min. score rear: 60
Found 0 input files.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Done in 9 ms.
another thing to keep in mind is that if you are using custom barcode sequences, we have ours in the singularity container, if you want to test your own you have to mount them in the singularity container, i gave an example of how i mounted some fastqs above --bind /source/directory:/location/in/container
and then you would use the /location/in/container
instead of our path for this parameter --barcodes /plasmidseq/barcodes/v2/
I would recommend testing out that barcode configs are working outside of the container first since the singularity step just adds more complexity, once it's working outside it should work inside.
I'm trying to run Circuit-seq using the example data (run starting with fastqs), and I'm having a problem. Following all setup (GPU with CUDA, Singularity, docker, Nextflow), I created a copy of the run_nf.sh shell script, modified the script as your guideline. However, whenever I run bash run_nf.sh at the terminal, I get this error:
ambl@ambl-H97M-D3H:~/Circuitseq/example_data/run_starting_with_fastqs$ bash exrun_nf.sh
N E X T F L O W ~ version 22.10.7
Launching /home/ambl/Circuitseq/pipelines/CircuitSeq.nf
[distracted_woese] DSL1 - revision: 3ad8260c0a
Methylation calling: false
Quality control output: true
Project : /home/ambl/Circuitseq/pipelines
Git info: null - null [null]
Cmd line: nextflow run /home/ambl/Circuitseq/pipelines/CircuitSeq.nf --GPU ON -c /home/ambl/Circuitseq/pipelines/nextflow.config -with-singularity /home/ambl/Circuitseq/plasmidassembly_1_0_1.sif --samplesheet /home/ambl/Circuitseq/example_data/example_samplesheet.tsv --use_existing_basecalls true --fast5 none --basecalling_dir /home/ambl/Circuitseq/example_data/fastq/ --base_calling_summary_file /home/ambl/Circuitseq/example_data/fastq/sequencing_summary.txt --barcodes /home/ambl/Circuitseq/barcodes/v2/ --barcode_kit MY-CUSTOM-BARCODES --guppy_model dna_r9.4.1_450bps_sup.cfg --medaka_model r941_min_sup_g507 --gpu_slot 'cuda:0' --barcode_min_score 65 --quality_control_processes true -resume
Manifest's pipeline version: null
executor > local (1)
[- ] process > GuppyBaseCalling -
[- ] process > GuppyDemultiplex -
[0e/ba04dd] process > GuppyDemultiplexExisting [ 0%] 0 of 1
[- ] process > pycoQC -
[- ] process > AlignReadsPre -
[- ] process > Porechop -
executor > local (3)
[- ] process > GuppyBaseCalling -
[- ] process > GuppyDemultiplex -
[0e/ba04dd] process > GuppyDemultiplexExisting [100%] 1 of 1, failed: 1 ✔
[- ] process > pycoQC -
[- ] process > AlignReadsPre -
[- ] process > Porechop -
[- ] process > AlignReadsPostLengthFilter -
[- ] process > FilterReads -
executor > local (3)
[- ] process > GuppyBaseCalling -
[- ] process > GuppyDemultiplex -
[0e/ba04dd] process > GuppyDemultiplexExisting [100%] 1 of 1, failed: 1 ✔
[- ] process > pycoQC -
[- ] process > AlignReadsPre -
[- ] process > Porechop -
[- ] process > AlignReadsPostLengthFilter -
[- ] process > FilterReads -
[- ] process > CanuCorrect -
[- ] process > Flye -
[- ] process > Miniasm -
[- ] process > ConvertGraph -
[- ] process > AssessAssemblyApproach -
[- ] process > MedakaConsensus -
[- ] process > LCPCorrectionFlye -
[- ] process > Rotate -
[- ] process > MedakaConsensusLCP -
[- ] process > MedakaPolish -
[- ] process > MedakaPolish2 -
[- ] process > Fast5Subset -
[- ] process > OGMethylationCalling -
[- ] process > ReferenceCopy -
[- ] process > AlignReads -
[- ] process > AlignReadsNanofilter -
[- ] process > AssessContamination -
[92/abc256] process > ContaminationAggregation [100%] 1 of 1 ✔
[- ] process > PlasmidComparison -
[29/b445ee] process > PlasmidComparisonCollection [100%] 1 of 1 ✔
[0e/ba04dd] NOTE: Process GuppyDemultiplexExisting
terminated with an error exit status (139) -- Error is ignored
After running, there are two folders in the repository (run_starting_with_fastqs), named "results" and "work". But there are no output of the .nextflow.log file from the run, and "results" folder include aggregate_assembly_assessment folder and aggregate_contamination folder, respectively. Inside of each folder, there is a txt file named all_nextpolish2.stats, contents of file are below: assembly replicon_name length contig_name contiguity identity max_indel
and named all_contamination_stats. assembly contamination
This is my run_nf.sh
#It is safest to use absolute paths
nextflow run /home/ambl/Circuitseq/pipelines/CircuitSeq.nf \
--GPU ON \
-c /home/ambl/Circuitseq/pipelines/nextflow.config \
-with-singularity /home/ambl/Circuitseq/plasmidassembly_1_0_1.sif \
--samplesheet /home/ambl/Circuitseq/example_data/example_samplesheet.tsv \
--use_existing_basecalls true \
--fast5 "none" \
--basecalling_dir /home/ambl/Circuitseq/example_data/fastq/ \
--base_calling_summary_file /home/ambl/Circuitseq/example_data/fastq/sequencing_summary.txt \
--barcodes /home/ambl/Circuitseq/barcodes/v2/ \
--barcode_kit "MY-CUSTOM-BARCODES" \
--guppy_model dna_r9.4.1_450bps_sup.cfg \
--medaka_model r941_min_sup_g507 \
--gpu_slot cuda:0 \
--barcode_min_score 65 \
--quality_control_processes true \
-resume
Am I doing something wrong? I'm so sorry but I can't figure out how to solve this problem on my own.
Thanks very much for your help.
Sincerely.
Hi @AMBL-CW,
This looks like a sample sheet issue? I put a new python script in the base folder called _pre_flightcheck.py. You can run it like:
python pre_flight_check.py -s your_samplesheet.tsv
I've tried to enumerate some of the common setup errors, and as we find more, I'll put them in. Could you try running this on your sample sheet and let me know what it says? If it passes, I'd love to learn more about your setup. Thanks!
-Aaron
Thank you for reply, Aaron! I run _pre_flightcheck.py as your command, and this is results:
**ambl@ambl-H97M-D3H:~/Circuitseq$ python pre_flight_check.py -s example_samplesheet.tsv
Fri Mar 3 09:57:52 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 33% 29C P8 1W / 38W | 332MiB / 1024MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2112 G /usr/lib/xorg/Xorg 57MiB |
| 0 N/A N/A 2250 G /usr/bin/gnome-shell 122MiB |
| 0 N/A N/A 2775 G ...1/usr/lib/firefox/firefox 145MiB |
+-----------------------------------------------------------------------------+
nvidia-smi ran successfully
Checking sample sheet...
Your reference doesn't exist: references/lentiCas9_Blast.fa, on line: 06 lentiCas9_Blast references/lentiCas9_Blast.fa (line number 2)
Traceback (most recent call last):
File "/home/ambl/Circuitseq/pre_flight_check.py", line 83, in
Is there any problem with _examplesamplesheet.tsv? Actually, I used the original file, without any modification. Should I modify something within _examplesamplesheet.tsv?
Thanks again your help.
Sincerely.
hi @AMBL-CW,
The references described in your sample sheet should be set to the path of any references you have for your known plasmids (and should be NA if a map doesn't exist). Moreover, you do need to fill in your sample sheet with any samples you might have loaded on the run, one row for each.
There's a lot in the configuration, so I made a simple example that should run with the respository and the singularity sif file. In the run_with_basecalling directory there's now a script called run_nf_with_example_data.sh that runs the whole pipeline for a single plasmid. All it requires is the absolute path to the sif file. If you run that (from that directory) does it work for you? We get assembly of the target plasmid in a fresh install of the pipeline. The next step would be to update the samplesheet with your samples and reads and trying again, with the hope of eliminating the errors you've hit. All the best,
Aaron
Good morning,
I'm trying to run Circuit-seq using the example data before trying it with our own, and I'm having a problem that I can't solve.
Everytime I run
bash Circuitseq/pipelines/run_nf.sh
, I get this error:This is my run_nf.sh
I've tried to modify the path value for --basecalling_dir several times without success. I'm not able to figure out what might be happening.
Perhaps, there is an error in my run_nf.sh that I can't see. Could you please chek it out?
Thanks a lot in advance.
Regards