novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

Don't know how to add nanopolish to the enviromental path #108

Closed acarmas1 closed 2 years ago

acarmas1 commented 2 years ago

I'm trying to use EpiNano to identify the m6A modification. However, when I run Epinano_Current with this code:

module load samtools/1.10 export PATH=$PATH:/projects/dsn001/camila/nanopolish/nanopolish/bin

sh /projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh \ -b alnWT_sort.bam \ -r /project02/insect_multiomics/camila/xpore/Bee_Thorax/data/WT/fastq/basecalled.fastq \ -f GCF_003254395.2_Amel_HAv3.1_genomic.fna \ -t 6 -m g \ -d /project02/insect_multiomics/Bee_seq/Bee_RNA_seq/Thorax/20200317_1504_MN31749_FAM97316_fb967124/fast5

I'm getting this error:

/projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh: line 52: nanopolish: command not found /projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh: line 53: nanopolish: command not found /projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh: line 53: pigz: command not found File "/projects/dsn001/camila/programs/EpiNano/misc/eventalign_strandedness.py", line 30 print (l.rstrip(),file=out1) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax

I understand that I'm not giving the right path to nanopolish, but I don't know how to do it, if someone could help me with an example, please.

Huanle commented 2 years ago

Hi @acarmas1 ,

I think

export PATH=$PATH:/projects/dsn001/camila/nanopolish/nanopolish/bin

would do the job. Can you list file(s) within /projects/dsn001/camila/nanopolish/nanopolish/bin to confirm the executable is there?

acarmas1 commented 2 years ago

image

These are all the files, I already got nanopolish to run. However, I'm getting this error:

[readdb] indexing /project02/insect_multiomics/Bee_seq/Bee_RNA_seq/Thorax/20200317_1504_MN31749_FAM97316_fb967124/fast5 [readdb] num reads: 945818, num reads with path to fast5: 945818 /projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh: line 53: pigz: command not found File "/projects/dsn001/camila/programs/EpiNano/misc/eventalign_strandedness.py", line 30 print (l.rstrip(),file=out1) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax

When I run this code:

module load samtools/1.10

bash /projects/dsn001/camila/programs/EpiNano/Epinano_Current.sh \ -b alnWT_sort.bam \ -r /project02/insect_multiomics/camila/xpore/Bee_Thorax/data/WT/fastq/basecalled.fastq \ -f GCF_003254395.2_Amel_HAv3.1_genomic.fna \ -t 6 -m g \ -d /project02/insect_multiomics/Bee_seq/Bee_RNA_seq/Thorax/20200317_1504_MN31749_FAM97316_fb967124/fast5

Huanle commented 2 years ago

Hi @acarmas1 It seems you need to install pigz, which is available here https://zlib.net/pigz/

acarmas1 commented 2 years ago

Hi Huanle

I got pigz to run, but now I got this error:

[readdb] indexing /project02/insect_multiomics/Bee_seq/Bee_RNA_seq/Thorax/20200317_1504_MN31749_FAM97316_fb967124/fast5 [readdb] num reads: 945818, num reads with path to fast5: 945818 [post-run summary] total reads: 739787, unparseable: 0, qc fail: 6145, could not calibrate: 18216, no alignment: 2792, bad fast5: 0 File "/projects/dsn001/camila/programs/EpiNano/misc/eventalign_strandedness.py", line 30 print (l.rstrip(),file=out1) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 53 smallfile = f"{tmp_dir}/{idx}.chunk" ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 17 print (header, file = outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 59 print (kmer+','+l, file=outfh) ^ SyntaxError: invalid syntax

Huanle commented 2 years ago

Hi @acarmas1 , Sorry for the late reply. May I ask which version of python you are using?

acarmas1 commented 2 years ago

Hi Huanle, Don't worry, I use python3. However, with your comment I realize I did not activate python3 to run epinano_current. I'll do it and let you know if it works.

acarmas1 commented 2 years ago

Hi Huanle, when I activate python3 everything looked well. However, when the job finished I got this error:

[readdb] indexing /project02/insect_multiomics/Bee_seq/Bee_RNA_seq/Thorax/20200317_1504_MN31749_FAM97316_fb967124/fast5 [readdb] num reads: 945818, num reads with path to fast5: 945818 [post-run summary] total reads: 739786, unparseable: 0, qc fail: 6145, could not calibrate: 18216, no alignment: 2792, bad fast5: 0 Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 2, in from Bio import SeqIO ModuleNotFoundError: No module named 'Bio' Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/Epinano_Current.py", line 2, in from Bio import SeqIO ModuleNotFoundError: No module named 'Bio' Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 35, in main() File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 30, in main outfh = open (f'{input_dir}/Intensity.collapsed.tsv','w') FileNotFoundError: [Errno 2] No such file or directory: 'alnWT_sort.eventalign.tsv.gz.forward_events.collapsed/Intensity.collapsed.tsv' Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 35, in main() File "/projects/dsn001/camila/programs/EpiNano/misc/concat_events.py", line 30, in main outfh = open (f'{input_dir}/Intensity.collapsed.tsv','w') FileNotFoundError: [Errno 2] No such file or directory: 'alnWT_sort.eventalign.tsv.gz.reverse_events.collapsed/Intensity.collapsed.tsv' Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 204, in main() File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 202, in main slide_intensity (inp, win) File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 70, in slide_intensity outfh = open (out_tmp,'w') FileNotFoundError: [Errno 2] No such file or directory: 'alnWT_sort.eventalign.tsv.gz.reverse_events.collapsed/Intensity.collapsed.tsv.5mer.tmp' Traceback (most recent call last): File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 204, in main() File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 202, in main slide_intensity (inp, win) File "/projects/dsn001/camila/programs/EpiNano/misc/Slide_Intensity.py", line 70, in slide_intensity outfh = open (out_tmp,'w') FileNotFoundError: [Errno 2] No such file or directory: 'alnWT_sort.eventalign.tsv.gz.forward_events.collapsed/Intensity.collapsed.tsv.5mer.tmp'

I'm really sorry, don't know what I'm doing wrong.

Huanle commented 2 years ago

Hi @acarmas1 , As the error message implies: you need to install the biopython package. Can you please install all required packages listed in the README document?

acarmas1 commented 2 years ago

Hi Huanle,

It's been a while, my epinano current job finished in mora than 10 days. I have a question, about further steps.

I have fast5 files for RNA with the m6A modification (WT) and fast5 files that I know don't have the m6A (KO), so should I run epinano current for both conditions? WT and KO? I already did for the WT, and the process generated directories for forward and reverse events. image Should I use the .csv file from the forward directory or the reverse?

Also, I want to use the Epinano_DiffErr.R the -k and -w has to be the files generated with epinano variants or epinano current? If it has to be from epinano variants, I have .csv files for the minus and plus strand. image Which one should I pick? the plus or the minus?

Thank you so much, Camila

Huanle commented 2 years ago

Hi @acarmas1 , You can combine the forward and reverse strand data into one file for further analysis. Those scripts are strand sensitive and they can tell different strandedness apart. Epinano_DiffErr is more suitable for small/short reference sequences, because it was originally used for detecting modifications from small RNAs. If you want to use it with long genome reference sequences, I recommend using it in a sliding window analysis manner. Cheers - Huanle

acarmas1 commented 2 years ago

Thanks Huanle, Just one last question, for Epinano_DiffErr, I have a long genome reference, how I can do the sliding window analysis manner?

Huanle commented 2 years ago

I do not have a script for that. But I think it's quite straightforward to segment your input, aka those variants files, on your reference sequences and feed them into epinano_differr.