Long running time - Githubissues

y9c / pseudoU-BIDseq

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)

https://bidseq.chuan.science/

GNU General Public License v3.0

14 stars 4 forks source link

Long running time #13

Closed jingangdidi closed 2 weeks ago

jingangdidi commented 3 weeks ago

Hi y9c, I run the pipeline for 3 input (mESC-WT_polyA-RNA_Input_rep1/2/3.fastq) and 3 treat (mESC-WT_polyA-RNA_Treated_rep1/2/3.fastq), run with 48 cores, the bowtie2 has a long running time in mapping_unsort (mESCWT-rep2-treated_run1_genes.fq has been running for 24 hours). I read your paper, it's say "sequencing data processing and mutation calling: ~4 h", is this ok? Thanks a lot!

y9c commented 3 weeks ago

Yes. The bowtie2 is extremely slow, especially when the number of rRNA reads is high. You can try speedy_mapping: true in your config file to speed up.

jingangdidi commented 3 weeks ago

Hi y9c, I got "Successfully finished all jobs.", in "call_sites" and "filter_sites" folder, all tsv files do not contain motif sequence, but your paper said "Each row in the table provides essential information about a specific Ψ site, including its genomic coordinates and the surrounding motif sequence.". How can I get the motif sequence? Thanks a lot!

y9c commented 3 weeks ago

Yes. I did not include gene annotation in the updated version to make it more applicable to species beyond just humans and mice. You can complete the gene annotation using R or any other tools you prefer.