Open sidizhao opened 1 year ago
Hi @sidizhao,
It looks like you ran nanocompore on a genomic reference instead of a transcriptomic reference. Sometimes Nanocompore can stall when the reference sequences are super long (greater than 50kb), and this is likely the reason that you're experiencing a long execution time. You can either kill the process and start it again and see if it gets through the stall that way (this sometimes works and we don't know why), or start the whole pipeline over again aligning to a transcriptome reference fasta. Given you have all the data together, it might be worth simply restarting it and seeing if that works, but I suspect that redoing the pipeline with a transcriptome reference is better. If you provide SampComp a bed file, it will do an internal liftover from transcriptome reference coordinates to genome reference coordinates.
I hope this helps, Logan
Hi,
Thank you for the prompt response. By transcriptomic reference, do you mean only the exonic regions of the fasta file? Or if I were to provide a bed file, what should the bed file contain? Just trying to clarify.
On Tue, Jul 11, 2023 at 04:48 lmulroney @.***> wrote:
Hi @sidizhao https://github.com/sidizhao,
It looks like you ran nanocompore on a genomic reference instead of a transcriptomic reference. Sometimes Nanocompore can stall when the reference sequences are super long (greater than 50kb), and this is likely the reason that you're experiencing a long execution time. You can either kill the process and start it again and see if it gets through the stall that way (this sometimes works and we don't know why), or start the whole pipeline over again aligning to a transcriptome reference fasta. Given you have all the data together, it might be worth simply restarting it and seeing if that works, but I suspect that redoing the pipeline with a transcriptome reference is better. If you provide SampComp a bed file, it will do an internal liftover from transcriptome reference coordinates to genome reference coordinates.
I hope this helps, Logan
— Reply to this email directly, view it on GitHub https://github.com/tleonardi/nanocompore/issues/222#issuecomment-1630513841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKH54EOAWW3LSH6CN3NBS6LXPUOO5ANCNFSM6AAAAAA2EZ4I7I . You are receiving this because you were mentioned.Message ID: @.***>
Hi @sidizhao,
Yes, by transcriptomic reference I mean a reference fasta file of each contiguous transcript isoform with no introns present, and one reference sequence per isoform. This is the download link for the genocde transcriptome reference fasta. https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.transcripts.fa.gz
Alternatively, you can create a reference transcriptome fasta from the reference genome and a gtf file using bedtools get fasta.
And by bed file, I mean a bed file that matches the transcriptome reference fasta file in genomic coordinates. So you can use something like bedparse (https://github.com/tleonardi/bedparse) to convert a gtf file to bed12 format. You can find the gencode reference gtf file on the gencode home page.
Does this make sense? Logan
Yes. Thank you. Would it work if I keep the current genomic fasta file but add a bed12 file of only the transcripts? Or do I necessarily need to download the transcripts only fasta?
You essentially need to start over from the minimap2 step using the transcriptome reference fasta instead of the genome reference fasta. This will require that you redo eventalign and eventalign collapse as well from this new bam file. Importantly, you do not want to align in splice aware mode when using a transcriptome reference fasta.
You can find more detailed instructions here ( https://doi.org/10.1002/cpz1.683) or here (https://nanocompore.rna.rocks/) if you want a more in depth breakdown of the steps.
Let me know if you have more questions.
Logan
Oh wow, I see. That is going to take a while since these Direct RNA-seq files take a long time on nanopolish. I will come back with more questions if it still doesn't work. Thank you so much.
You can try using f5c instead of nanopolish. It is a c implementation of nanopolish and is roughly 10 times faster. There are a few flags you need to use that are unique to f5c that are not used by nanopolish. The protocol paper I posted earlier goes through all the necessary differences using f5c compared to nanopolish if you decide to give it a try.
Briefly, you need to use --rna --min-mapq=0 --secondary=yes in addition to all the normal nanopolish commands
But I'm doing this from memory, so double check the help messages to make sure I have the spelling correct!!!
Logan
Thank you so much! We’ve been using slow5tools to process the fast5 files first before nanopolish, and it’s been decently fast.
On Tue, Jul 11, 2023 at 13:12 lmulroney @.***> wrote:
You can try using f5c instead of nanopolish. It is a c implementation of nanopolish and is roughly 10 times faster. There are a few flags you need to use that are unique to f5c that are not used by nanopolish. The protocol paper I posted earlier goes through all the necessary differences using f5c compared to nanopolish if you decide to give it a try.
Briefly, you need to use --rna --min-mapq=0 --secondary=yes in addition to all the normal nanopolish commands
But I'm doing this from memory, so double check the help messages to make sure I have the spelling correct!!!
Logan
— Reply to this email directly, view it on GitHub https://github.com/tleonardi/nanocompore/issues/222#issuecomment-1631273698, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKH54EJK2IKBOTGTNDR5B5TXPWJQ3ANCNFSM6AAAAAA2EZ4I7I . You are receiving this because you were mentioned.Message ID: @.***>
Describe the bug Hi, I've been trying to run SampComp on 6 samples of ONT Direct RNA-seq on a METTL3 KD cell line for some time, and have yet to get past the "parse transcript" step. I have 512G of RAM requested to run this and it just gets stuck for multiple days. Here's the log:
To Reproduce I ran a bash script based on our linux computing cluster using the docker image
quay.io/biocontainers/nanocompore:1.0.4--pyhdfd78af_0
. Here's the commad:Would you be able to help me? Thank you.