y9c / pseudoU-BIDseq

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)
https://bidseq.chuan.science/
GNU General Public License v3.0
14 stars 4 forks source link

rcFastq #6

Closed hanguojun007 closed 12 months ago

hanguojun007 commented 1 year ago

hello, i run you docker, apptainer run -B /workplace bidseq_latest.sif

i produce error: /pipeline/bin/rcFastq: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /pipeline/bin/rcFastq)

can you update this glibc?

hanguojun007 commented 12 months ago

Hi, today i find this parameter forward_stranded: false may be right for me. so i set it in the data.yaml. And i also set speedy_mapping: true, but I met this problem. i can't get it! was i set parameter in wrong site?

i aslo don't understand if i should set the adapter of my, and how set ?

[Tue Nov 7 14:35:08 2023] rule reverse_reads: input: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz output: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz jobid: 30 reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz wildcards: sample=WT-Testis-2-IP, rn=run1 resources: tmpdir=/tmp

Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Select jobs to execute... [Tue Nov 7 14:35:08 2023] Finished job 30. 9 of 111 steps (8%) done Removing temporary output .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz. Select jobs to execute... WorkflowError: File .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz seems to be a broken symlink.

y9c commented 12 months ago

Hi @hanguojun007, you ran the code correctly. It is an internal error of the pipeline. The temporary file is being removed too quickly before the next step starts. I have not thoroughly tested reversed libraries. Thank you for pointing this out. I have just fixed it in the latest version. Could you please run the apptainer command again? You do not need to remove the previous output, as the pipeline will restart from this step automatically.

hanguojun007 commented 12 months ago

Thanks, it run ! i have a quenstion: what dose reason: Missing output files mean? and my terminal show the this process is bowtie2-align-s, is that mean the mapping rRNA is true? but i set speedy_mapping: true!

rule map_to_genes_by_bowtie2: input: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz, internal_files/mapping_index/genes.1.bt2 output: .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq, report_reads/mapping/mESCWT-rep1-input_run1_genes.report jobid: 10 reason: Missing output files: report_reads/mapping/mESCWT-rep1-input_run1_genes.report, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq; Input files updated by another job: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz wildcards: sample=mESCWT-rep1-input, rn=run1 threads: 20 resources: tmpdir=/tmp

y9c commented 12 months ago

Yes. "Missing output file" is not an error. It checks if the output of each step of the pipeline exists and rerun these steps if the output file does not exist.

hanguojun007 commented 12 months ago

Hi, i have continued to run this pipeline for two days, but it stiil in the rule map_to_genes_by_bowtie2 of WT-Testis-2-IP_run1.fq.gz, and only get such little file. Could you tell me why and how to deal it?

Thanks !! image

[Tue Nov 7 23:59:45 2023] rule map_to_genes_by_bowtie2: input: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz, internal_files/mapping_index/genes.1.bt2 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report jobid: 29 reason: Missing output files: report_reads/mapping/WT-Testis-2-IP_run1_genes.report, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq; Input files updated by another job: internal_files/mapping_index/genes.1.bt2, .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz wildcards: sample=WT-Testis-2-IP, rn=run1 threads: 20 resources: tmpdir=/tmp

y9c commented 12 months ago

Could you attched the reference fasta you used for the genes mapping? This step is for masking ribosomal reads, and it should not take such a long time.

hanguojun007 commented 12 months ago

Mus_musculus.GRCm39.ncrna.zip

hanguojun007 commented 12 months ago

sorry, it's not rRNA fasta!!!i was wrong.

y9c commented 12 months ago

Thank you for the information. Yes, this step is for masking rRNA or tRNA reads only. Mapping the whole transcriptome using this setting would take an extremely long time.

hanguojun007 commented 12 months ago

i have a bug with perl. my device have perl=5.16, and i run apptainer in mamba env, but the sif container show perl: symbol lookup error: /root/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Cwd/Cwd.so: undefined symbol: Perl_xs_version_bootcheck when i run bowtie2. Thanks again !!!

y9c commented 12 months ago

Hi @hanguojun007. The docker env won't be affected by the perl on your host machine. I am not sure if apptainer app or the pipeline triggered this error. Could you send me the full log for debugging?

hanguojun007 commented 12 months ago
  1. i run apptainer pull docker://y9ch/bidseq to get bidseq_latest.sif.
  2. i run apptainer run -B /workplace bidseq_latest.sif
  3. when run to bowtie2, i call [Thu Nov 9 17:11:02 2023] rule map_to_genes_by_bowtie2: input: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz, internal_files/mapping_index/genes.1.bt2 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report jobid: 29 reason: Missing output files: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, report_reads/mapping/WT-Testis-2-IP_run1_genes.report; Input files updated by another job: internal_files/mapping_index/genes.1.bt2, .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz wildcards: sample=WT-Testis-2-IP, rn=run1 threads: 20 resources: tmpdir=/tmp

Error: BamOpen { target: "-" } [main_samview] fail to read the header from "-". [Thu Nov 9 17:11:02 2023] Error in rule map_to_genes_by_bowtie2: jobid: 29 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report shell:

    export LC_ALL=C
    /pipeline/micromamba/bin/bowtie2 -p 20             --end-to-end --ma 0 --score-min L,4,-0.5 -D 20 -R 3 -L 8 -N 1 -i S,1,0.5 --mp 6,3 --rdg 1,2 --rfg 6,3 --norc -a             --no-unal --un .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq -x internal_files/mapping_index/genes -U .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz 2>report_reads/mapping/WT-Testis-2-IP_run1_genes.report |             /pipeline/bin/samFilter |             /pipeline/micromamba/bin/samtools view -O BAM -o .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job map_to_genes_by_bowtie2 since they might be corrupted: report_reads/mapping/WT-Testis-2-IP_run1_genes.report Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-09T163310.814984.snakemake.log

  1. question again, if star version should >= 2.7.10. because my star_index was build by star 2.5.1. so it call [Thu Nov 9 18:48:17 2023] rule map_to_genome_by_star: input: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_SJ.out.tab, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out jobid: 31 reason: Missing output files: report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam; Input files updated by another job: .tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq wildcards: sample=WT-Testis-2-IP, rn=run1 threads: 20 resources: tmpdir=/tmp

EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.10b SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a

Nov 09 18:48:18 ...... FATAL ERROR, exiting [Thu Nov 9 18:48:18 2023] Error in rule map_to_genome_by_star: jobid: 31 output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_SJ.out.tab, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out shell:

    ulimit -n 20000
    rm -f .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
    mkfifo .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
    cat .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1 | gzip > discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz &
    /pipeline/micromamba/bin/STAR           --runThreadN 20           --genomeDir /workplace/database/mouse/ucsc_mm39/STAR_index           --readFilesIn .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq,.tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq           --alignEndsType Local           --scoreDelOpen -1           --scoreDelBase -1           --scoreInsOpen -2           --scoreInsBase -2           --outFilterMatchNmin 15           --outFilterMatchNminOverLread 0.8           --outFilterMismatchNmax 10           --outFilterMismatchNoverLmax 0.2           --outFilterIntronMotifs RemoveNoncanonicalUnannotated           --alignSJDBoverhangMin 1           --alignSJoverhangMin 5           --chimSegmentMin 20           --chimOutType WithinBAM HardClip           --chimJunctionOverhangMin 15           --chimScoreJunctionNonGTAG 0           --outFilterMultimapNmax 10           --outFilterMultimapScoreRange 0           --outSAMmultNmax -1           --outMultimapperOrder Random           --outReadsUnmapped Fastx           --outSAMtype BAM Unsorted           --outStd BAM_Unsorted           --outSAMattrRGline ID:WT-Testis-2-IP SM:WT-Testis-2-IP LB:RNA PL:Illumina PU:SE           --outSAMattributes NH HI AS nM NM MD jM jI MC ch           --outFileNamePrefix .tmp/star_mapping/WT-Testis-2-IP_run1_ > .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam
    mv .tmp/star_mapping/WT-Testis-2-IP_run1_Log.final.out report_reads/mapping/WT-Testis-2-IP_run1_genome.report
    rm .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job map_to_genome_by_star since they might be corrupted: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-09T171328.164488.snakemake.log

hanguojun007 commented 12 months ago

the pipepline have stay in this step for hours, and the top don't show any info about pipline. But there is no error.

[Fri Nov 10 11:06:58 2023] rule reverse_reads: input: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz output: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz jobid: 18 reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz wildcards: sample=WT-Testis-2-Input, rn=run1 resources: tmpdir=/tmp

[Fri Nov 10 11:06:58 2023] rule gap_realign: input: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam output: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram jobid: 28 reason: Missing output files: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram; Input files updated by another job: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam wildcards: sample=WT-Testis-1-IP, rn=run1, reftype=genes resources: tmpdir=/tmp

Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Select jobs to execute... [Fri Nov 10 11:07:00 2023] Finished job 18. 27 of 90 steps (30%) done Removing temporary output .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz. Select jobs to execute...

y9c commented 11 months ago

Hi @hanguojun007! Thank you for providing the debugging information. To build the STAR index correctly, make sure you're using the latest version of STAR. In the future, I plan to have the pipeline generate the index automatically. For now, you'll need to update the STAR index version on your end. When it comes to bowtie2 errors, they can be quite complex since they don't provide clear error logs. However, based on my experiments, most bowtie2 errors are caused by running out of memory.

It appears that the gap realigner step is taking longer than anticipated. If this step is taking too much time, it suggests that there are numerous reads with gaps in your dataset. However, I find it unlikely that the psu level is that high. I suspect that the adapter sequence isn't completely trimmed, which could result in artifacts in the alignment. To confirm this, you can check the bam file.

y9c commented 11 months ago

By the way, could you give more information about your library preparation method? Also, would you mind posting different bugs as new issues? This would be helpful for other users in finding the useful information they need.