Closed AGunnell77 closed 2 years ago
Hi @AGunnell77, thanks for reporting this issue!
Can you post the full command you used when running into the error and attach the full log (.nextflow.log
in the project directory) file please?
Thanks in advance!
HI is this OK? Not sure how to use the project directory.
nextflow run hoelzer-lab/rnaflow --reads ~/path/RNAflow_sample_sheet.csv --autodownload hsa --pathway hsa --strand 2 -with-tower --condaCacheDir ~/path/condacache/ --skip_sortmerna -resume
Thank you @AGunnell77, that is already very helpful!
According to the log file your sample names resolve to null
instead of the real sample name. Would you mind attaching your RNAflow_sample_sheet.csv
, or maybe just the first few lines? Maybe the issue is already there and the pipeline doesn't read the sample names in correctly :)
Cheers!
Hi This is my sample sheet.
sample,R1,R2,Condition,Source,strandedness PARENTAL_rep1,/path/K010001_Parental_1_S37_R1_001.fastq.gz,/path/K010001_Parental_1_S37_R2_001.fastq.gz,Parental,,2 PARENTAL_rep2,/path/K010002_Parental_2_S38_R1_001.fastq.gz,/path/K010002_Parental_2_S38_R2_001.fastq.gz,Parental,,2 PARENTAL_rep3,/path/K010003_Parental_3_S39_R1_001.fastq.gz,/path/K010003_Parental_3_S39_R2_001.fastq.gz,Parental,,2 KO_rep1,/path/K010004_KnO_1_S40_R1_001.fastq.gz,path/K010004_KnO_1_S40_R2_001.fastq.gz,TKO,,2 KO_rep2,/path/K010005_KnO_2_S41_R1_001.fastq.gz,/path/K010005_KnO_2_S41_R2_001.fastq.gz,TKO,,2 KO_rep3,/path/K010006/K010006_KnO_3_S42_R1_001.fastq.gz,path/K010006_KnO_3_S42_R2_001.fastq.gz,TKO,,2
Thanks
Many thanks @AGunnell77 ! It seems the header of your input.csv file is slightly incorrect, your header is:
sample,R1,R2,Condition,Source,strandedness
but the pipeline expects a header of this kind:
Sample,R1,R2,Condition,Source,Strandedness
The column names in the input.csv are case sensitive, so sample
will not be recognized but Sample
will.
Maybe we should change this to accept both lowercase and uppercase columnnames :)
Just correct the lowercase columnnames in your input.csv header and you should be good to go! Or just copy the correct header I posted above. Then you should also see that the processes show which sample they are currently processing, e.g. like this:
process > preprocess_illumina:fastqcPre (PARENTAL_rep1)
instead of
process > preprocess_illumina:fastqcPre (null)
Let me know if this fixed the issue!
Many Thanks! I'll give it a go. Andrea
Hi, the change in the sample sheet got me a bit further but the process then stopped with an error report as attached (error_report_nauseous_wilson). I tried resuming but the deseq2 stage is still not proceeding (nextflowlogserene_galileo) nextflowlogserene_galileo.txt error_report_nauseous_wilson.txt Thanks Andrea
Hi @AGunnell77 !
It seems the deseq2
script is crashing for some reason. For better understanding of whats going on, could you please attach the deseq2
log file of the ?
You can find the file in the working directory of the process :
/data/scratch/DGE/DUDGE/MOPOPGEN/agunnell/RNAflowrevcorrect/work/d6/6b49f615bbb881b603cce213e059e8/deseq2.Rout
Here is the file. It looks like there are no counts due to it not being recognised as paired-end data and excluding all the paired-end alignments? deseq2.Rout.txt
Status PARENTAL_rep2.sorted.bam Assigned 0 Unassigned_Unmapped 3356569 Unassigned_Read_Type 255871819 Unassigned_Singleton 0 Unassigned_MappingQuality 0 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 0 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 0 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 0
Process BAM file PARENTAL_rep2.sorted.bam... || || Strand specific : reversely stranded || || WARNING: Paired-end reads were found and excluded. || || Total alignments : 259228388 || || Successfully assigned alignments : 0 (0.0%) || || Running time : 3.99 minutes
@AGunnell77 that looks odd indeed. However, the first log file .nextflow.log4
suggested that the pipeline detected the read mode correctly as paired-end. Can you check if the .bam
files really contain any alignments? Im no quite sure where the log file is from that you posted above?
HI they were from here: /work/08/523f900061b92f7d277e0692829ce0/.command.log /work/08/523f900061b92f7d277e0692829ce0/PARENTAL_rep2.counts.tsv.summary I had this previously prior to running into the input file name collision. When I added a -p to this command.sh here (see below) and ran the command within this work directory it then showed all the reads alligned in the feature counts summary... but then I had the file name collision and started from scratch with the updated input csv. /data/scratch/DGE/DUDGE/MOPOPGEN/agunnell/RNAflowrevcorrect/work/08/523f900061b92f7d277e0692829ce0/.command.sh
i.e featureCounts -p -T 1 -s 2 -a annotation.gtf -o PARENTAL_rep2.counts.tsv -t exon -g gene_id PARENTAL_rep2.sorted.bam
@AGunnell77 So after fixing the input.csv did you run the pipeline from scratch am I understanding correctly?
Or did you run only part of the pipeline e.g. because you restarted it with -resume
?
Im currently testing on some paired-end data but this issue never occurred to me before.
If the pipeline detects paired-end reads from the input.csv, featurecounts should be run with the -p
parameter usually..
If you didn't run the pipeline from scratch (so without -resume
) with the corrected input.csv, could you please do so and tell me if the featurecounts summary still reports the same issue?
Cheers
HI I started from scratch after changing the input file. All the work and results file in this directory were deleted. I resumed once at the point of the deseq2 error in case it was a glitch but that was after the input file was corrected.
HI , so I manually added the -p and it has all aligned OK but I now have this error: deseq2.R.out_2.txt any ideas? It has actually carried out DESeq2 and I can see volcano plot, MA, plot, heatmap and excel results in /work/97/6b0a537e47d5c7e8ae0b3a863d3e7c/ but the pathway analysis has not been carried out and the DESQ2 results are not in the results folder or final multiQC.
I feel I am close! Thanks so much for all your help so far! Andrea
Hi @AGunnell77, great news that deseq2 is running now!
From the log file you attached it seems that your disk quota is full
cannot create dir '/home/agunnell/.cache/biomaRt', reason 'Disk quota exceeded'
and that is why the script crashes. Maybe you can delete some files to free up some space in your home directory and try again?
... or dont start the pipeline in your home directory! Or point the work directory (-w
) and the results folder (--output
) to another path w/ enough disk space.
(thx @fischer-hub for the troubleshooting!)
Hmmm, that's strange. I'm not running from my home directory and I have my cache set elsewhere too. I also added the -w and --output elsewhere on the wrapper but for this stage it still seemed to want to use the home directory as when I cleared space there it has run. Anyway. It's all completed successfully now. Thank you so much for all your help! Andrea
Great that it ran through now!
Yes, the biomaRt
R package has a cache directory that apparently is set to /home
by default so this issue was independent of the output and work directories!
Best wishes!
Hi I have come across this issue while running the pipeline:
Error executing process > 'expression_reference_based:tpm_filter'
Caused by: Process
expression_reference_based:tpm_filter
input file name collision -- There are multiple input files for each of the following file names: null.counts.tsv