Closed dmgie closed 4 months ago
I encountered this problem before, but had no time to fix it yet. The problem is that the iGenomes STAR index is not compatible with the STAR version used in the pipeline. A workaround is setting manually setting star = null
, which prevents usage of iGenomes and thus forces the pipeline to build an own index. Opened #91 for this.
This is most probably due to missing escape characters in nextflow scripts, will be fixed via #83
Could potentially also be fixed via #83
I expect #83 to be merged into dev
within the next days, if you want to try it faster, you can use the caching
branch of the pipeline (nextflow run -r caching ...
). Keep me posted, in case something is working better then. If the problems (especially with DCC) persist, you could also try to fix the scripts yourself and open a PR.
Hi, just letting you know I ran with the merged bug fix #83, and also got a similar DCC error.
ValueError: invalid literal for int() with base 10: '2"'
details:
-[nf-core/circrna] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC (control_1)'
Caused by:
Process `NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC (control_1)` terminated with an error exit status (1)
Command executed:
sed -i 's/^chr//g' gencode.v44.chr_patch_hapl_scaff.annotation.gtf
mkdir control_1 && mv control_1.Chimeric.out.junction control_1 && printf "control_1/control_1.Chimeric.out.junction" > samplesheet
mkdir control_1_mate1 && mv control_1_mate1.Chimeric.out.junction control_1_mate1 && printf "control_1_mate1/control_1_mate1.Chimeric.out.junction" > mate1file
mkdir control_1_mate2 && mv control_1_mate2.Chimeric.out.junction control_1_mate2 && printf "control_1_mate2/control_1_mate2.Chimeric.out.junction" > mate2file
DCC @samplesheet -mt1 @mate1file -mt2 @mate2file -D -an gencode.v44.chr_patch_hapl_scaff.annotation.gtf -Pi -ss -F -M -Nr 1 1 -fg -A GRCh38.p14.genome.fa -N -T 12
awk '{print $6}' CircCoordinates >> strand
paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print $1,$2,$3,$5,$4}' >> control_1.txt
cat <<-END_VERSIONS > versions.yml
"NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC":
dcc: $(DCC --version)
END_VERSIONS
Command exit status:
1
Command output:
Output folder ./ already exists, reusing
DCC 0.5.0 started
24 CPU cores available, using 12
WARNING: non-stranded data, the strand of circRNAs guessed from the strand of host genes
Please make sure that the read pairs have been mapped both, combined and on a per mate basis
Collecting chimera information from mates-separate mapping
Combining individual circRNA read counts
Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering
Filtering by read counts
Remove ChrM
Count CircSkip junctions
started circRNA detection from file _tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G
=> separating duplicates [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> locating small circRNAs [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> locating circRNAs (unstranded mode) [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> merging circRNAs [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> sorting circRNAs (unstranded mode) [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
finished circRNA detection from file _tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G
Command error:
Unable to find image 'quay.io/biocontainers/circtools:1.2.1--pyh7cba7a3_0' locally
1.2.1--pyh7cba7a3_0: Pulling from biocontainers/circtools
73349e34840e: Already exists
acab339ca1e8: Already exists
425fd6205dc3: Pulling fs layer
425fd6205dc3: Download complete
425fd6205dc3: Pull complete
Digest: sha256:7317627874031c4c9924d40b76602662a2d400c9ee4c1c626998c287e5e7bd65
Status: Downloaded newer image for quay.io/biocontainers/circtools:1.2.1--pyh7cba7a3_0
Output folder ./ already exists, reusing
DCC 0.5.0 started
24 CPU cores available, using 12
Traceback (most recent call last):
File "/usr/local/bin/DCC", line 10, in <module>
WARNING: non-stranded data, the strand of circRNAs guessed from the strand of host genes
Please make sure that the read pairs have been mapped both, combined and on a per mate basis
Collecting chimera information from mates-separate mapping
Combining individual circRNA read counts
Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering
Filtering by read counts
Remove ChrM
Count CircSkip junctions
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/DCC/main.py", line 490, in main
CircSkipfiles = findCircSkipJunction(output_coordinates, options.tmp_dir,
File "/usr/local/lib/python3.10/site-packages/DCC/main.py", line 679, in findCircSkipJunction
circStartAdjacentExons, circStartAdjacentExonsIv = CCEM.findcircAdjacent(circStartExons, Custom_exon_id2Iv,
File "/usr/local/lib/python3.10/site-packages/DCC/Circ_nonCirc_Exon_Match.py", line 281, in findcircAdjacent
interval = Custom_exon_id2Iv[self.getAdjacent(ids, start=start)]
File "/usr/local/lib/python3.10/site-packages/DCC/Circ_nonCirc_Exon_Match.py", line 222, in getAdjacent
exon_number = int(custom_exon_id.split(':')[1]) - 1
ValueError: invalid literal for int() with base 10: '2"'
started circRNA detection from file _tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G
=> separating duplicates [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> locating small circRNAs [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> locating circRNAs (unstranded mode) [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> merging circRNAs [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
=> sorting circRNAs (unstranded mode) [_tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G]
finished circRNA detection from file _tmp_DCC/control_1.Chimeric.out.junction.4ZYC4G
Work dir:
/Users/marieke/Documents/work/66/1152ceffd34d8324ab17adcbacdcf6
Hey, @dmgie, could you maybe check if your errors do still occur with the latest version of the pipeline? There were some PRs fixed in the meantime and I am not sure if they also adressed your problems
The DCC issue looks like there is an additional "
somewhere, we might be able to clean this away
Hi @nictru, sorry for the late reply! I would've been happy to test it but sadly I do not have access anymore to the data I had initially run the pipeline with (which had lead to the errors mentioned in the first post). I could possibly test it with some other data, but I'm currently not working on projects related to circRNA-related anlaysis anymore so I can't promise to be able to test it anytime soon. If I do, I'll try and report back here in case there are any changes. Would you want me to close the issue meanwhile or should it be left open?
Hey, no problem - I will close the issue in this case, feel free to open a new one if you encounter new problems some time in the future :)
Description of the bug
Hiya, thank you for the work on the pipeline! Currently, when I try to run the pipeline using my own (paired-end) data, it seems that there are a few steps in the pipeline in which it fails and exits. When going through the test run/profile though (using the test profile i.e
nextflow run nf-core/circrna -c ./hpc.config -profile test,singularity -r dev -ansi-log false -resume
) it seems to work fine and the pipeline completes.The first issue that arose was regarding STAR. If it uses the
genome: GRCh37
parameter, from what I understand this obtains the necessary fies/indices from iGenome. The issue is that when it reaches the mapping step prior toDCC
, it fails due to Genome & STAR version incompatibility (STAR output below). The image used for this step seems to contain STAR version 2.7.10a, whereas Genome was generated with 2.7.4a, so could be a need to downgrade the image to a older STAR version? [*1]Alternatively, I saw that I can provide my own fasta/gtf (and also the required species) parameter, so I tried it using the files from Ensembl (https://grch37.ensembl.org/Homo_sapiens/Info/Index). This seemed to work fine, but during DCC’s execution results in a
ValueError: invalid literal for int() with base 10: '4"'
error (more details below). From what I have found so far is that the GTF doesn't get parsed correctly by theCirc_nonCirc_Exon_Match.py
functions of DCC/circtools. Installing and runningcirctools detect
/DCC
with the same files seems to work fine.There was another error I had run into when trying to add/use
ciriquant
as a tool which errored out withCIRIquant.utils.PipelineError: Empty hisat2 bam generated, please re-run CIRIquant with -v and check the fastq and hisat2-index
. Re-running this viabash .command.run
results in the same error. If I try on the other hand launching the singularity image myself and run the commands i.eworks fine and runs.
I have copied the errors to the box below. The command that was run (which produced the errors)is:
nextflow run nf-core/circrna -c ./hpc.config -params-file ./params.yaml -profile singularity -r dev -ansi-log false -resume
. Do let me know if there is anything I can help with.On a sidenote: in the
targetscan_format.sh
script, its mentioned in a comment thatSubset mature.fa according to the species provided by user to '--genome'
but from briefly looking around wasn't able to find where this might be included in the pipeline?[*1] Tried using a custom image with a downgraded STAR version, still get the same error
Command used and terminal output
STAR
CIRIquant
DCC (own fasta/gtf)
Relevant files
No response
System information
Nextflow Version: 23.10.0 Hardware: HPC/Cluster Executor: Slurm Container: Singularity OS: Ubuntu nf-core/circrna version: dev