Closed yshen16 closed 4 months ago
Hey - thanks for the issue. The error occurs in a relatively new pipeline section which probably was not there when you last executed the pipeline. It was tested a bit already, but there are always edge cases that have not been considered yet.
TRANSCRIPTOME
process tries to extract the sequences of the detected circRNAs into a FASTA file based on an input GTF file. This GTF file is pipeline-internal and not provided by you. gffread
command is used for this and receives the FASTA file (genome.fa
) and the transcriptome GTF file (transcriptome.filtered.gtf
) as input. The differences between GTF and GFF should not be relevant here, works in other test cases.Warning: couldn't find fasta record for 'GL000191.1'!
. This suggests that there are entries in the GTF file that are located on a chromosome that are not present in the FASTA file. So now the question arises: How can we detect circRNAs that are not present in the provided reference genome sequence?QUANTIFICATION
subworkflow, a common quantification of the circular and linear transcriptome is performed (the goal is to allow unbiased correlation analyses). So before the GFFREAD
step, the detected circRNAs are merged with all the linear transcript locations you provide in the input GTF file. Most likely, the problematic regions originate from the input GTF file. The GTF file you provide as input most likely contains entries that are not present in the input FASTA file.
This is something that the pipeline should be able to handle, and it is not complicated to implement. Can probably do this later this week. If you need this more urgently, you could manually remove the problematic entries from the GTF file.
As the pipeline is still under a lot of development, please make sure you use the latest version (nextflow pull nf-core/circrna
). Recently, the module
parameter has been deprecated (still present in your CLI call).
Thank you very much for taking time to write me back. Really appreciate it. I wonder if you can provide me some programable way to exclude those transcriptome entries in gtf file that are not in genome.fa? I feel there must be some tools to handle it. I'll look for such tool myself as well. And I'll look forward to hearing the good news from you that the pipeline is fixed.
Hey, could you check out the branch associated with this issue and give it a try? You can do this by adding -r 151-error-no-genomic-sequence-available-check-g-option-error-in-gffread-command-for-task-nfcore_circrnacircrnaquantificationtranscriptome-error
to the nextflow run command. Sorry for the long name
Hey, could you check out the branch associated with this issue and give it a try? You can do this by adding
-r 151-error-no-genomic-sequence-available-check-g-option-error-in-gffread-command-for-task-nfcore_circrnacircrnaquantificationtranscriptome-error
to the nextflow run command. Sorry for the long name
Thank you, Nictru, I will test it out and let you know how it goes.
Any updates?
Sorry for the delay, and I ran the new dev version of the pipeline and here are the results:
nextflow run nf-core/circrna -profile test,singularity --outdir test_afterfix -r 151-error-no-genomic-sequenc e-available-check-g-option-error-in-gffread-command-for-task-nfcore_circrnacircrnaquantificationtranscriptome- error
Okay great! Will create a PR
However, the actual pipeline run with the user data and configuration failed at the 'trimgalore', or more specifically, 'cutadapt' stage: error info: `ERROR: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/cutadapt/pipeline.py", line 626, in run raise e EOFError: Compressed file ended before the end-of-stream marker was reached
cutadapt: error: Compressed file ended before the end-of-stream marker was reached
Cutadapt terminated with exit signal: '256'. Terminating Trim Galore run, please check error message(s) to get an idea what went wrong... forDebug.zip ` I attached a zip file which contains the .command.sh and .command.err in it. The actual sequence files are too big to attach as a file here. If needed I can send it to you some other way.
Hey, could you verify that the corresponding input fastq file has not been corrupted? Maybe a download problem or a problem during compression?
Yes, you are right. It turned out the region2 read file was truncated. I will try to find if I can get the good sequence files and rerun the test. Thank you for pointing this out to me.
Description of the bug
recently I tried to help a user to run the nf-core/circRNA pipeline, but met tons of errors. And this is one of the latest after I downloaded the reference genome from aws, and rerun everything from scratch, that is, I removed all the residual ~/.nextflow, ~/.singularity and all that in my working directory when I run the command. Here is the command line I ran:
module purge module load java/17.0.8 module load nextflow/23.04.3 # old 21.10.6 doesn't work
#Set the new Singularity cache directory path export SING_DIR=/projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3
# Export the NXF_SINGULARITY_CACHEDIR environment variable to point to the new cache directory export NXF_SINGULARITY_CACHEDIR=$SING_DIR/.singularity
# Create the new Singularity cache directory if it doesn't already exist mkdir -p $NXF_SINGULARITY_CACHEDIR
# Remove the existing ~/.singularity symbolic link rm -rf ~/.singularity
# Create a new symbolic link for ~/.singularity that points to the new cache directory ln -s $NXF_SINGULARITY_CACHEDIR ~/.singularity
cd $SING_DIR
# Yun Test_rerun on 06/16/2024: nextflow run nf-core/circrna -profile singularity --genome GRCh37 --igenomes_base /projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/ref/references --input samples.csv --module 'circrna_discovery' --outdir test3_out -r dev
And here is my local references folder structure: ./Homo_sapiens ./Homo_sapiens/Ensembl ./Homo_sapiens/Ensembl/GRCh37 ./Homo_sapiens/Ensembl/GRCh37/Annotation ./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA ./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/mature.fa ./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/hairpin.fa ./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes ./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf ./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed ./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/refFlat.txt.gz ./Homo_sapiens/Ensembl/GRCh37/Sequence ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.dict ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai ./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/GenomeSize.xml ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.rev.2.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.2.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.1.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.3.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.4.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.rev.1.ebwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrName.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/exonInfo.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/SAindex ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/Genome ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/geneInfo.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbList.out.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbInfo.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/transcriptInfo.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbList.fromGTF.out.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrLength.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/Log.out ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genes.gtf ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genomeParameters.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrStart.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/SA ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrNameLength.txt ./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/exonGeTrInfo.tab ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.3.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.1.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.rev.1.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.4.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.2.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.rev.2.bt2 ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.pac ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.ann ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0 ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.pac ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.ann ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.sa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.amb ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.bwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.pac ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.ann ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rpac ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rsa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.sa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.amb ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rbwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.bwt ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.sa ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.amb ./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.bwt
and the error I met is as below:
Jun-17 17:41:27.965 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 11; name: NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome); status: COMPLETED; exit: 1; error: -; workDir: /projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20] Jun-17 17:41:27.970 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for task: name=NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome); work-dir=/projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20 error [nextflow.exception.ProcessFailedException]: Process
NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)
terminated with an error exit status (1) Jun-17 17:41:27.987 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)'Caused by: Process
NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)
terminated with an error exit status (1)Command executed:
gffread \ -g genome.fa \ -w transcriptome.fasta \ \ transcriptome.filtered.gtf
cat <<-END_VERSIONS > versions.yml "NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME": gffread: $(gffread --version 2>&1) END_VERSIONS
Command exit status: 1
Command output: (empty)
Command error: FASTA index file genome.fa.fai created. Warning: couldn't find fasta record for 'GL000191.1'! Error: no genomic sequence available (check -g option!).
Work dir: /projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20
Looking into a big details about 'gffread' command, I think it is expecting a 'gff' file, while the pipeline provided a 'gtf' as an input file (transcriptome.filtered.gtf) and would this be the cause of the error? if so, how could this happen? is there anything missing in my command line? We used to be able to run the pipeline around Oct 2023, or maybe even this April, but now everything seems broken.
Please help. Thank you very much!!!