novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

Problem with Epinano_Variants.py #116

Closed twocata closed 2 years ago

twocata commented 2 years ago

Hi, this is my first time using EpiNano, and I had a problem running the following command from run.sh: python Epinano_Variants.py -R ref.fa -b poke.sorted.bam -T t -s ./misc/sam2tsv.jar The result is as follows: /mnt/storage/pokemon/poke_rna/poke_rna_fastq/poke.sortedTMP already exists, will overwrite it /mnt/storage/pokemon/poke_rna/poke_rna_fastq/poke.sortedTMP already exists, will overwrite it <generator object stdin_stdout_gen at 0x7f6f4fd9a6d0> Process Process-2: Traceback (most recent call last): File "/home/pokemon/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/pokemon/miniconda3/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "Epinano_Variants.py", line 46, in split_tsv_for_per_site_var_freq firstline = next(tsv) StopIteration How can I solve this problem? Thank you in advance.

Huanle commented 2 years ago

Hi @twocata, Apologies for my late reply. Very likely, ./misc/sam2tsv.jar did not process your bam file. Can you try:

samtools view -h -F 3860 poke.sorted.bam  | java -jar  /path/to/sam2tsv -r Ref.fa

see what you can get?

Alternatively, you could also pull the docker image and run it with your input files.

mnrusimh commented 2 years ago

I wonder if @twocata has ever found a resolution because I am having exactly the same issue as the one reported above.

minimap2/TE_Barcode_1.1_TMP_ already exists, will overwrite it Process Process-84: Traceback (most recent call last): File "/myhome/.conda/envs/nanoconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/myhome/.conda/envs/nanoconda/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/myhome/Applications/Epinano1.2.0/Epinano_Variants.py", line 45, in split_tsv_for_per_site_var_freq firstline = next (tsv) StopIteration

Interestingly, the script worked without any issues when used with a different dataset.

Also, the suggestion from @Huanle seems to work without any issues.

samtools view -h -F 3860 minimap2/TE_Barcode_1.1.bam | java -jar ~/Applications/Epinano1.2.0/misc/sam2tsv.jar -r ../reference/TE12_notag.fa

The above command returns:

[INFO][Sam2Tsv]. Completed. N=129. That took:2 seconds

And below is the gist of the output it produced.

READ_NAME FLAG CHROM READ_POS BASE QUAL REF_POS REF OP

78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 0 G * 2325 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 1 A 8 2326 a M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 2 A > 2327 a M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 3 A 9 2328 a M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 4 T ; 2329 t M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 5 T 3 2330 t M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 6 G 2 2331 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 7 A / 2332 a M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 8 G : 2333 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 9 G ; 2334 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 10 C 7 2335 c M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 11 C 3 2336 c M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 12 G 6 2337 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 13 A 6 2338 a M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 14 C 1 2339 c M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 15 G $ 2340 g M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 16 T & . . I 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 17 T 0 2341 t M 78dc3581-f420-440e-851b-9524a2f5921a 0 TE12_notag 18 G 7 2342 g M

Any suggestions on how I can handle this issue?

Huanle commented 2 years ago

Hi @twocata, @mnrusimh

I wrote a new script in the misc folder - Epinano_Variants.dev.py for the same purpose and it does not rely on sam2tsv.

The help message is self-explainable.

Please give it a go and let me know how it works.

Best, Huanle

mnrusimh commented 2 years ago

Hi @Huanle , The new script ran without any errors. However, there was no output generated. I haven't gone through the source code of this new script yet, but I was expecting a .csv file in the folder containing the bam files. However, nothing seems to have happened. What and where should I be looking for?

/myhome/Applications/Epinano1.2.0/Epinano_Variants.dev.py -r ../reference/TE12_notag.fa -c 40 -b minimap2/TE_Barcode_1.1.bam

Huanle commented 2 years ago

Hi @mnrusimh ,

Do you mind sharing with me your bam (perhaps part of it) and reference file so that i can have a look at it?

Thanks a lot.

mnrusimh commented 2 years ago

Sure, how do I send them to you? Thanks.

Huanle commented 2 years ago

Can you @mnrusimh share through google drive or dropbox by sending a link to elzedliu@gmail.com? Thanks!

mnrusimh commented 2 years ago

@Huanle Please check your email.

Huanle commented 2 years ago

Yes. I will have a look at it asap.

On Sun, Jun 19, 2022 at 11:17 AM Ram Podicheti @.***> wrote:

@Huanle https://github.com/Huanle Please check your email.

— Reply to this email directly, view it on GitHub https://github.com/novoalab/EpiNano/issues/116#issuecomment-1159607679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG57EMASIHZJNVYNZBZJ4DVP2GOFANCNFSM5THZPBYQ . You are receiving this because you were mentioned.Message ID: @.***>

--

mnrusimh commented 2 years ago

Hi Huanle, I am wondering if you were able to reproduce the issue I was having. Do you have any updates for me? Thanks.

Huanle commented 2 years ago

Hi @mnrusimh ,

I can not reproduce the error that you ran into.

$ python ../Epinano_main/misc/Epinano_Variants.dev.py -b TE_Barcode_1.1.filt.bam -r TE12_notag.fa
analysis took 0.94674 seconds

$ ls

TE12_notag.fa      TE_Barcode_1.1.bam      TE_Barcode_1.1.filt.bam
TE12_notag.fa.fai  TE_Barcode_1.1.bam.bai  TE_Barcode_1.1.filt.fwd.per.site.csv

$ head TE_Barcode_1.1.filt.fwd.per.site.csv
#Ref,pos,strand,base,cov,q_mean,q_median,q_std,mat,mis,ins,del
TE12_notag,2324,+,G,1,9.00000,9.00000,0.00000,1.00000,0.00000,0.00000,0.00000
TE12_notag,2325,+,A,1,23.00000,23.00000,0.00000,1.00000,0.00000,0.00000,0.00000
TE12_notag,2326,+,A,1,29.00000,29.00000,0.00000,1.00000,0.00000,0.00000,0.00000
TE12_notag,2327,+,A,1,24.00000,24.00000,0.00000,1.00000,0.00000,0.00000,0.00000
TE12_notag,2328,+,T,1,26.00000,26.00000,0.00000,1.00000,0.00000,0.00000,0.00000
TE12_notag,2329,+,T,1,18.00000,18.00000,0.00000,1.00000,0.00000,0.00000,0.00000
...
Huanle commented 2 years ago

Can I know which version of python and pysam are you having? @mnrusimh

mnrusimh commented 2 years ago

I am using python 3.6.8 and pysam 0.18.0.

$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import pysam
>>> pysam.__version__
'0.18.0'
mnrusimh commented 2 years ago

Can you provide me with your environment details? I will try replicating that. Thanks.

mnrusimh commented 2 years ago

I tried to go through the Epinano_Variants.dev.py script and noticed

int (pysam.flagstat(bam).split('\n')[4][0]) > 0

was returning False in the function

has_reads_mapped(bam)

I am not sure if it is an issue with the bam file, but you didn't experience the same issue. I wonder what else could be causing the problem.

mnrusimh commented 2 years ago

I think I found the solution for my problem. In the has_reads_mapped function, I changed the line from the flagstat output to be processed.

def has_reads_mapped (bam) :
    return int (pysam.flagstat(bam).split('\n')[1][0]) > 0

You can see that I change [4] to [1].

Here is the output I have from pysam.flagstat(bam)

129 + 0 in total (QC-passed reads + QC-failed reads)
129 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
129 + 0 mapped (100.00% : N/A)
129 + 0 primary mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

There is one other minor issue I noticed in the current script. At present, it requires the script to be run (pwd) in the same directory as the bam files. Should be an easy fix.

Thanks for your help again.