Closed hanguojun007 closed 12 months ago
Hi, today i find this parameter forward_stranded: false
may be right for me. so i set it in the data.yaml. And i also set speedy_mapping: true
, but I met this problem. i can't get it! was i set parameter in wrong site?
i aslo don't understand if i should set the adapter of my, and how set ?
[Tue Nov 7 14:35:08 2023] rule reverse_reads: input: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz output: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz jobid: 30 reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz wildcards: sample=WT-Testis-2-IP, rn=run1 resources: tmpdir=/tmp
Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Select jobs to execute... [Tue Nov 7 14:35:08 2023] Finished job 30. 9 of 111 steps (8%) done Removing temporary output .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz. Select jobs to execute... WorkflowError: File .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz seems to be a broken symlink.
Hi @hanguojun007, you ran the code correctly. It is an internal error of the pipeline. The temporary file is being removed too quickly before the next step starts. I have not thoroughly tested reversed libraries. Thank you for pointing this out. I have just fixed it in the latest version. Could you please run the apptainer command again? You do not need to remove the previous output, as the pipeline will restart from this step automatically.
Thanks, it run !
i have a quenstion: what dose reason: Missing output files
mean? and my terminal show the this process is bowtie2-align-s
, is that mean the mapping rRNA is true? but i set speedy_mapping: true
!
rule map_to_genes_by_bowtie2: input: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz, internal_files/mapping_index/genes.1.bt2 output: .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq, report_reads/mapping/mESCWT-rep1-input_run1_genes.report jobid: 10 reason: Missing output files: report_reads/mapping/mESCWT-rep1-input_run1_genes.report, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq; Input files updated by another job: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz wildcards: sample=mESCWT-rep1-input, rn=run1 threads: 20 resources: tmpdir=/tmp
Yes. "Missing output file" is not an error. It checks if the output of each step of the pipeline exists and rerun these steps if the output file does not exist.
Hi, i have continued to run this pipeline for two days, but it stiil in the rule map_to_genes_by_bowtie2
of WT-Testis-2-IP_run1.fq.gz, and only get such little file. Could you tell me why and how to deal it?
Thanks !!
[Tue Nov 7 23:59:45 2023] rule map_to_genes_by_bowtie2: input: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz, internal_files/mapping_index/genes.1.bt2 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report jobid: 29 reason: Missing output files: report_reads/mapping/WT-Testis-2-IP_run1_genes.report, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq; Input files updated by another job: internal_files/mapping_index/genes.1.bt2, .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz wildcards: sample=WT-Testis-2-IP, rn=run1 threads: 20 resources: tmpdir=/tmp
Could you attched the reference fasta you used for the genes mapping? This step is for masking ribosomal reads, and it should not take such a long time.
sorry, it's not rRNA fasta!!!i was wrong.
Thank you for the information. Yes, this step is for masking rRNA or tRNA reads only. Mapping the whole transcriptome using this setting would take an extremely long time.
i have a bug with perl. my device have perl=5.16, and i run apptainer in mamba env, but the sif container show perl: symbol lookup error: /root/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Cwd/Cwd.so: undefined symbol: Perl_xs_version_bootcheck
when i run bowtie2.
Thanks again !!!
Hi @hanguojun007. The docker env won't be affected by the perl on your host machine. I am not sure if apptainer app or the pipeline triggered this error. Could you send me the full log for debugging?
apptainer pull docker://y9ch/bidseq
to get bidseq_latest.sif.apptainer run -B /workplace bidseq_latest.sif
Error: BamOpen { target: "-" } [main_samview] fail to read the header from "-". [Thu Nov 9 17:11:02 2023] Error in rule map_to_genes_by_bowtie2: jobid: 29 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report shell:
export LC_ALL=C
/pipeline/micromamba/bin/bowtie2 -p 20 --end-to-end --ma 0 --score-min L,4,-0.5 -D 20 -R 3 -L 8 -N 1 -i S,1,0.5 --mp 6,3 --rdg 1,2 --rfg 6,3 --norc -a --no-unal --un .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq -x internal_files/mapping_index/genes -U .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz 2>report_reads/mapping/WT-Testis-2-IP_run1_genes.report | /pipeline/bin/samFilter | /pipeline/micromamba/bin/samtools view -O BAM -o .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job map_to_genes_by_bowtie2 since they might be corrupted: report_reads/mapping/WT-Testis-2-IP_run1_genes.report Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-09T163310.814984.snakemake.log
EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.10b SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a
Nov 09 18:48:18 ...... FATAL ERROR, exiting [Thu Nov 9 18:48:18 2023] Error in rule map_to_genome_by_star: jobid: 31 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_SJ.out.tab, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out shell:
ulimit -n 20000
rm -f .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
mkfifo .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
cat .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1 | gzip > discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz &
/pipeline/micromamba/bin/STAR --runThreadN 20 --genomeDir /workplace/database/mouse/ucsc_mm39/STAR_index --readFilesIn .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq,.tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq --alignEndsType Local --scoreDelOpen -1 --scoreDelBase -1 --scoreInsOpen -2 --scoreInsBase -2 --outFilterMatchNmin 15 --outFilterMatchNminOverLread 0.8 --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.2 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --alignSJDBoverhangMin 1 --alignSJoverhangMin 5 --chimSegmentMin 20 --chimOutType WithinBAM HardClip --chimJunctionOverhangMin 15 --chimScoreJunctionNonGTAG 0 --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 0 --outSAMmultNmax -1 --outMultimapperOrder Random --outReadsUnmapped Fastx --outSAMtype BAM Unsorted --outStd BAM_Unsorted --outSAMattrRGline ID:WT-Testis-2-IP SM:WT-Testis-2-IP LB:RNA PL:Illumina PU:SE --outSAMattributes NH HI AS nM NM MD jM jI MC ch --outFileNamePrefix .tmp/star_mapping/WT-Testis-2-IP_run1_ > .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam
mv .tmp/star_mapping/WT-Testis-2-IP_run1_Log.final.out report_reads/mapping/WT-Testis-2-IP_run1_genome.report
rm .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job map_to_genome_by_star since they might be corrupted: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-09T171328.164488.snakemake.log
the pipepline have stay in this step for hours, and the top
don't show any info about pipline. But there is no error.
[Fri Nov 10 11:06:58 2023] rule reverse_reads: input: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz output: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz jobid: 18 reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz wildcards: sample=WT-Testis-2-Input, rn=run1 resources: tmpdir=/tmp
[Fri Nov 10 11:06:58 2023] rule gap_realign: input: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam output: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram jobid: 28 reason: Missing output files: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram; Input files updated by another job: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam wildcards: sample=WT-Testis-1-IP, rn=run1, reftype=genes resources: tmpdir=/tmp
Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Select jobs to execute... [Fri Nov 10 11:07:00 2023] Finished job 18. 27 of 90 steps (30%) done Removing temporary output .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz. Select jobs to execute...
Hi @hanguojun007! Thank you for providing the debugging information. To build the STAR index correctly, make sure you're using the latest version of STAR. In the future, I plan to have the pipeline generate the index automatically. For now, you'll need to update the STAR index version on your end. When it comes to bowtie2 errors, they can be quite complex since they don't provide clear error logs. However, based on my experiments, most bowtie2 errors are caused by running out of memory.
It appears that the gap realigner step is taking longer than anticipated. If this step is taking too much time, it suggests that there are numerous reads with gaps in your dataset. However, I find it unlikely that the psu level is that high. I suspect that the adapter sequence isn't completely trimmed, which could result in artifacts in the alignment. To confirm this, you can check the bam file.
By the way, could you give more information about your library preparation method? Also, would you mind posting different bugs as new issues? This would be helpful for other users in finding the useful information they need.
hello, i run you docker, apptainer run -B /workplace bidseq_latest.sif
i produce error: /pipeline/bin/rcFastq: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /pipeline/bin/rcFastq)
can you update this glibc?