novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

Epinano_Variants.py - problem with multiprocessing? #72

Closed nemitheasura closed 3 years ago

nemitheasura commented 3 years ago

Hi @enovoa , I encountered similar problem as mentioned in closed issues #55 and #60

I've tried multiple versions of python, packages/modules and multiple ways to create venvs and run the script. I constantly encounter the same problem, though I followed tips provided in both abovementioned issues.

I also switched multiprocessing to multiprocess, as mentioned by Michael Dorner in this thread: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397 (edited your script accordingly).

The error looks as follows: Process Process-2: Traceback (most recent call last): File "/usr/local/software/EpiNano/1.2/lib/python3.6/site-packages/multiprocess/process.py", line 258, in _bootstrap self.run() File "/usr/local/software/EpiNano/1.2/lib/python3.6/site-packages/multiprocess/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/usr/local/software/EpiNano/1.2/Epinano_Variants.py", line 44, in split_tsv_for_per_site_var_freq head = next(tsv) StopIteration

Below is my package list (one of the multiple configurations I've already tested - this was done in py3.6.7, 3.6.11 and 3.8.2):

Package Version


attrs 20.3.0
biopython 1.76
cffi 1.14.3
cloudpickle 1.6.0
dask 2.5.2
fsspec 0.8.4
h5py 2.10.0
importlib-metadata 3.1.0
iniconfig 1.1.1
Jinja2 2.11.2
joblib 0.17.0
locket 0.2.0
MarkupSafe 1.1.1
numpy 1.15.4
packaging 20.4
pandas 0.24.2
partd 1.1.0
pip 18.1
pluggy 0.13.1
py 1.9.0
pycparser 2.20
pyparsing 2.4.7
pysam 0.16.0.1 pytest 6.1.2
python-dateutil 2.8.1
pytz 2020.4
PyYAML 5.3.1
scikit-learn 0.20.2
scipy 1.5.4
setuptools 40.6.2
six 1.15.0
threadpoolctl 2.1.0
toml 0.10.2
toolz 0.11.1
tzlocal 2.1
zipp 3.4.0

I would be grateful if you would like to help me solve this problem. Best, N.

Huanle commented 3 years ago

Hi @nemitheasura , Can you confirm you are using the latest version and can you share with me your command? Can you also ls -l your_reference.fasta* and show me the listed files?

Thanks a lot.

nemitheasura commented 3 years ago

Hi @Huanle,

Thanks for your response. I am using EpiNano version 1.2 cloned from gh.

  1. In case of your training dataset:

Command: Epinano_Variants.py -n 6 -R ref.fa -b wt.bam -s /usr/local/software/EpiNano/1.2/misc --type t

list of files: ref.dict ref.fa ref.fa.fai

  1. In case of my own dataset: Command: Epinano_Variants.py -n 6 -R ref_transcriptome_PK_based.fasta -b CKO1_transcriptome_map.bam -s /usr/local/software/EpiNano/1.2/misc --type t

List of files: ref_transcriptome_PK_based.fasta ref_transcriptome_PK_based.fasta.dict ref_transcriptome_PK_based.fasta.fai ref_transcriptome_PK_based.fasta.mmidx

In this case I have an additional minimap index for other purposes.

As using my own data got me to end up with an error, I switched to your test dataset. In each case, the error is the same.

Oh, and I ran Epinano on two machines - here are my os details:

NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04"

NAME="Linux Mint" VERSION="20 (Ulyana)" ID=linuxmint ID_LIKE=ubuntu PRETTY_NAME="Linux Mint 20" VERSION_ID="20"

I also tried to run backup version (Epinano_Variants.bak) instead of the default one. Nothing changed, the same error occurred.

Looking forward for your response. Best, N.

Huanle commented 3 years ago

hi @nemitheasura , can you try -s /usr/local/software/EpiNano/1.2/misc/sam2tsv.jar ?

nemitheasura commented 3 years ago

Hi, Huanle, thank you very much. I appreciate your help. Bless ya! N.

P.S. Would you be so kind and tell me, what is wrong this time? - after Epinano_Variants i ran Epinano_DiffErr:

Command: Epinano_DiffErr.R -k KO1.plus_strand.per.site.csv -w ctrl.plus_strand.per.site.csv -t 3 -o Test -c 5 -f sum_err -d 0.1 -p

Error: Error in [.data.frame(input, , c("X.Ref", "pos", "position", "base", : undefined columns selected Calls: cleanup -> [ -> [.data.frame Execution halted

Data table (i used head command):

Ref,pos,base,strand,cov,q_mean,q_median,q_std,mis,ins,del

15S_rRNA::chrM:6546-8194,1,T,+,3,25.33333,24.00000,2.62467,0.00000,0.00000,0.00000 15S_rRNA::chrM:6546-8194,2,A,+,7,22.00000,23.00000,2.92770,0.00000,0.00000,0.00000 15S_rRNA::chrM:6546-8194,3,A,+,8,21.00000,21.50000,4.09268,0.00000,0.00000,0.00000 15S_rRNA::chrM:6546-8194,4,A,+,8,21.87500,24.00000,5.08521,0.00000,0.00000,0.00000 15S_rRNA::chrM:6546-8194,5,A,+,8,23.87500,23.50000,6.45053,0.00000,0.00000,0.00000

I would be grateful for your assistance.

Huanle commented 3 years ago

Hi @nemitheasura , -f sum_err tells the script to use sum_err as the feature for analysis. But it is not there. You should use whatever feature (mis, ins, del, and q_mean) that is from your input file.

There is a script in misc that can generate sum_err fro you if you are interested in that specific feature.