tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
80 stars 12 forks source link

SystemError: null argument to internal routine #221

Open kwonej0617 opened 1 year ago

kwonej0617 commented 1 year ago

Hi, Thank you for developing a useful tool. I have run Nanocompore with my data but I got the following error.

PLEASE NOTE: This is an experimental module to load a centrally installed
miniconda environment. We do not recommend using this to install your own conda
environments; we do recommend installing miniconda in your /pi space to do so. 

2023-06-06T16:40:36.516635-0400 WARNING - MainProcess | Running Eventalign_collapse
2023-06-06T16:40:36.517060-0400 INFO - MainProcess | Checking and initialising Eventalign_collapse
2023-06-06T16:40:36.517893-0400 INFO - MainProcess | Starting data processing
2023-06-06T23:19:00.506438-0400 INFO - Process-40 | Output reads written:837586
2023-06-06T23:19:07.841388-0400 WARNING - MainProcess | Running Eventalign_collapse
2023-06-06T23:19:07.841799-0400 INFO - MainProcess | Checking and initialising Eventalign_collapse
2023-06-06T23:19:07.842604-0400 INFO - MainProcess | Starting data processing
2023-06-07T08:47:03.781879-0400 INFO - Process-40 | Output reads written:1208587
2023-06-07T08:47:10.514231-0400 WARNING - MainProcess | Running SampComp
2023-06-07T08:47:10.514659-0400 INFO - MainProcess | Checking and initialising SampComp
2023-06-07T08:47:10.516331-0400 INFO - MainProcess | Only 1 replicate found for condition wt
2023-06-07T08:47:10.516491-0400 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2023-06-07T08:47:10.516596-0400 INFO - MainProcess | Only 1 replicate found for condition ko
2023-06-07T08:47:10.516695-0400 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2023-06-07T08:47:11.894431-0400 INFO - MainProcess | Reading eventalign index files
2023-06-07T08:47:47.931830-0400 INFO - MainProcess |    References found in index: 33210
2023-06-07T08:47:47.932314-0400 INFO - MainProcess | Filtering out references with low coverage
2023-06-07T08:47:49.839676-0400 INFO - MainProcess |    References remaining after reference coverage filtering: 8770
2023-06-07T08:47:50.025163-0400 INFO - MainProcess | Starting data processing
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine
SystemError: null argument to internal routine

The following is my script.

#eventalign with samples
#singularity exec /share/pkg/containers/nanopolish/0.14.0/nanopolish-0.14.0.sif nanopolish eventalign --reads ${wt_fastq} --bam ${wt_bam} --genome ${ref} --samples --print-read-names --scale-events --threads 40 > wt_eventalign.txt
#singularity exec /share/pkg/containers/nanopolish/0.14.0/nanopolish-0.14.0.sif nanopolish eventalign --reads ${ko_fastq} --bam ${ko_bam} --genome ${ref} --samples --print-read-names --scale-events --threads 40 > ko_eventalign.txt

#eventalign_collapse
nanocompore eventalign_collapse -i wt_eventalign.txt -o wt -t 40
nanocompore eventalign_collapse -i ko_eventalign.txt -o ko -t 40

#samples comparison for detecting modifications
nanocompore sampcomp --file_list1 wt/out_eventalign_collapse.tsv --file_list2 ko/out_eventalign_collapse.tsv --label1 wt --label2 ko --fasta ${ref} --bed ${bed12} --outpath nanocompore --min_coverage 5 --min_ref_length 10 --allow_warnings --nthreads 40

It seems that wt_eventalign.txt and ko_eventalign.txt files were successfully generated.

contig  position        reference_kmer  read_name       strand  event_index     event_level_mean        event_stdv      event_length    model_kmer      model_mean      model_stdv      standardized_level      samples
ENST00000361390 1       TACCC   71576efc-add3-4fc3-823c-05628a65a835    t       1506    78.95   1.122   0.00930 TACCC   78.84   2.81    0.03    77.4855,77.7745,80.52,78.0635,79.7975,77.4855,77.919,78.0635,79.2195,78.9305,78.208,77.63,77.341,80.809,77.63,78.6415,79.2195,79.075,78.6415,79.942,80.3755,79.653,79.7975,79.075,78.3525,81.8205,78.497,80.6645
ENST00000361390 1       TACCC   71576efc-add3-4fc3-823c-05628a65a835    t       1507    78.83   1.662   0.01494 TACCC   78.84   2.81    -0.01   76.763,76.9075,79.364,79.653,76.9075,79.075,77.1965,77.7745,77.919,72.717,78.3525,76.6185,78.786,78.497,78.208,79.364,78.6415,77.4855,77.7745,78.497,78.3525,78.6415,78.0635,80.809,77.919,79.2195,78.497,79.942,77.1965,81.387,78.208,80.3755,78.9305,79.5085,81.2425,81.2425,79.653,78.6415,80.0865,79.5085,78.497,83.5545,78.786,80.231,82.254

Reference file (.fa) and .bed look like the following.

>ENST00000390424
ATGGCTTTGCAGAGCACTCTGGGGGCGGTGTGGCTAGGGCTTCTCCTCAACTCTCTCTGG
AAGGTTGCAGAAAGCAAGGACCAAGTGTTTCAGCCTTCCACAGTGGCATCTTCAGAGGGA
GCTGTGGTGGAAATCTTCTGTAATCACTCTGTGTCCAATGCTTACAACTTCTTCTGGTAC
CTTCACTTCCCGGGATGTGCACCAAGACTCCTTGTTAAAGGCTCAAAGCCTTCTCAGCAG
GGACGATACAACATGACCTATGAACGGTTCTCTTCATCGCTGCTCATCCTCCAGGTGCGG
GAGGCAGATGCTGCTGTTTACTACTGTGCTGTGGAGGA
>ENST00000390425
ATGGCCTCTGCACCCATCTCGATGCTTGCGATGCTCTTCACATTGAGTGGGCTGAGAGCT
CAGTCAGTGGCTCAGCCGGAAGATCAGGTCAACGTTGCTGAAGGGAATCCTCTGACTGTG
AAATGCACCTATTCAGTCTCTGGAAACCCTTATCTTTTTTGGTATGTTCAATACCCCAAC
CGAGGCCTCCAGTTCCTTCTGAAATACATCACAGGGGATAACCTGGTTAAAGGCAGCTAT
GGCTTTGAAGCTGAATTTAACAAGAGCCAAACCTCCTTCCACCTGAAGAAACCATCTGCC
CTTGTGAGCGACTCCGCTTTGTACTTCTGTGCTGTGAGAGACA

chr7    142544211       142544685       ENST00000611462 0       +       142544235       142544685       0       2       73,295, 0,179,
chr7    142554835       142555318       ENST00000611787 0       +       142554880       142555318       0       2       94,298, 0,185,
chr7    142560422       142560931       ENST00000620569 0       +       142560484       142560931       0       2       111,298,        0,211,
chr7    142563739       142564245       ENST00000617347 0       +       142563798       142564245       0       2       108,298,        0,208,

Could you please help me fix this problem? Thank you so much.

lmulroney commented 1 year ago

Hi @kwonej0617,

Just a quick question, In the eventalign_collapse.tsv snippet you shared, are the two data lines from the same file or the first data line from the two separate files with the same header? An eventalalign_collapse.tsv file should not have two lines for the same position for the same transcript. Could you also check if the eventalign_collapse.tsv files have index files?

Thanks, Logan

kwonej0617 commented 1 year ago

Actually what I shared above is wt_eventalign.txt and ko_eventalign.txt, not wt_eventalign.txt and ko_eventalign.txt.

Here is eventalalign_collapse.tsv. As you mentioned, each line has a distinct position within a transcript.

#71576efc-add3-4fc3-823c-05628a65a835   ENST00000361390
ref_pos ref_kmer        num_events      num_signals     dwell_time      NNNNN_dwell_time        mismatch_dwell_time     status  median  mad
1       TACCC   3       85      0.028220000211149454    0.0     0.0     valid   78.497  1.0115051
2       ACCCA   3       66      0.021919999504461884    0.0     0.0     valid   64.625  1.0115013
3       CCCAT   2       40      0.013280000304803252    0.0     0.0     valid   72.78925        1.0114975
4       CCATG   4       37      0.012279999908059835    0.0     0.0     valid   80.9535 1.7340012
5       CATGG   1       44      0.01460999995470047     0.0     0.0     valid   81.31475        1.0114975
6       ATGGC   1       8       0.0026599999982863665   0.0     0.0     valid   95.62025        2.8899956
7       TGGCC   1       24      0.007969999685883522    0.0     0.0     valid   101.256 2.0229988
8       GGCCA   3       61      0.02025000029243529     0.0     0.0     valid   101.184 1.8789978
9       GCCAA   2       32      0.010620000073686242    0.0     0.0     valid   73.006  1.5894966
10      CCAAC   3       43      0.014269999926909804    0.0     0.0     valid   85.5775 1.1559982
11      CAACC   1       22      0.007300000172108412    0.0     0.0     valid   89.69575        1.300499
12      AACCT   2       19      0.006310000084340572    0.0     0.0     valid   78.786  2.312004
13      ACCTC   2       42      0.01394000044092536     0.0     0.0     valid   70.116  0.7947502
14      CCTCC   1       20      0.006639999803155661    0.0     0.0     valid   66.28675        1.0114975
15      CTCCT   1       12      0.003980000037699938    0.0     0.0     valid   69.9715 0.8670006
16      TCCTA   1       42      0.013939999975264072    0.0     0.0     valid   72.139  1.083747

This is eventalign_collapse.tsv.idx file.

ref_id  ref_start       ref_end read_id num_events      num_signals     dwell_time      kmers   missing_kmers   NNNNN_kmers     mismatch_kmers  valid_kmers
     byte_offset     byte_len
ENST00000361390 1       951     71576efc-add3-4fc3-823c-05628a65a835    1887    35626   11.827349983388558      951     47      25      0       879     0
       61727
ENST00000361390 1       951     2ffef773-aba8-4661-99ce-3450bf9b281e    2782    51606   17.132349965977482      951     55      40      0       856     61728   61735
ENST00000361390 1       951     946d14af-3aaa-4207-adaa-8f59e603f756    1776    33154   11.00635999662336       951     42      27      0       882     123464  62335
ENST00000361390 1       935     6d97273d-5b12-444b-a735-38742b318747    2203    39707   13.181959986453876      935     44      23      0       868     185800  61013
ENST00000361390 1       951     a37a7b71-6ae3-4ad2-8b3b-1a7b27c3bc73    1609    27803   9.230239991564304       951     33      25      0       893     246814  62787
ENST00000361390 1       950     b3f6ee4c-8ce8-494f-96ad-3108dae3d30a    1748    32259   10.709549980936572      950     27      21      0       902     309602  63017
ENST00000361390 1       951     282c768d-35db-4a0c-ba32-b3e0d3c40aa8    1886    38191   12.678779986687005      951     34      21      0       896     372620  62635
ENST00000361390 1       951     fdde1767-74fb-48f5-8d13-6f5bfefbb0eb    2023    38461   12.768609974300489      951     44      31      0       876     435256  62274

Please let me know if you need more information. Thank you so much for your time!

lmulroney commented 1 year ago

Hi @kwonej0617,

Do you get the same "SystemError: null argument to internal routine" error if you try to run Nanocompore sampcomp outside of your script?

I did a little digging into the error "SystemError: null argument to internal routine", and it appears that might be an internal python error and not an error specific to Nanocompore. How did you install python and nanocompore on your system?

Thanks, Logan

kwonej0617 commented 1 year ago

@lmulroney Thank you for your response. I installed nanocompore using conda. I will try to run nanocompore sampcomp outside of the script, instead of submitting a job to the cluster. Also, I will try to install it again.

lmulroney commented 1 year ago

@kwonej0617,

I recommend checking your python installation in addition to your nanocompore installation.

Logan

kwonej0617 commented 1 year ago

@lmulroney Thank you for your advice. I have installed nanocompore and checked python installation again. After doing that, I didn't have "SystemError: null argument to internal routine" during running nanocompore. However, I got another error as below and the job was killed.

2023-06-22T08:57:36.115574-0400 WARNING - MainProcess | Running Eventalign_collapse
2023-06-22T08:57:36.116017-0400 INFO - MainProcess | Checking and initialising Eventalign_collapse
2023-06-22T08:57:36.117289-0400 INFO - MainProcess | Starting data processing
2023-06-22T19:28:53.406933-0400 INFO - Process-40 | Output reads written:837586
2023-06-22T19:28:59.128338-0400 WARNING - MainProcess | Running Eventalign_collapse
2023-06-22T19:28:59.128862-0400 INFO - MainProcess | Checking and initialising Eventalign_collapse
2023-06-22T19:28:59.130137-0400 INFO - MainProcess | Starting data processing
2023-06-22T22:36:14.728180-0400 ERROR - Process-40 | Error in Writer
2023-06-22T22:36:14.755494-0400 ERROR - Process-40 | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 266, in __write_output
    data_fp.write(data_str)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 280, in __write_output
    data_fp.write ("#\n")
OSError: [Errno 5] Input/output error

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/bin/nanocompore", line 8, in <module>
2023-06-22T22:36:14.756267-0400 INFO - Process-40 | Output reads written:254087
2023-06-22T22:36:14.756337-0400 ERROR - MainProcess | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 266, in __write_output
    data_fp.write(data_str)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 280, in __write_output
    data_fp.write ("#\n")
OSError: [Errno 5] Input/output error

2023-06-22T22:36:14.756909-0400 ERROR - MainProcess | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 266, in __write_output
    data_fp.write(data_str)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 280, in __write_output
    data_fp.write ("#\n")
OSError: [Errno 5] Input/output error

2023-06-22T22:36:14.757184-0400 ERROR - MainProcess | An error occured. Killing all processes and closing queues

    sys.exit(main())
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/__main__.py", line 175, in main
    args.func(args)
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/__main__.py", line 258, in eventalign_collapse_main
    e()
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 131, in __call__
    raise E
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 108, in __call__
    raise NanocomporeError(tb)
nanocompore.common.NanocomporeError: Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 266, in __write_output
    data_fp.write(data_str)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/Eventalign_collapse.py", line 280, in __write_output
    data_fp.write ("#\n")
OSError: [Errno 5] Input/output error

2023-06-22T22:36:21.650259-0400 WARNING - MainProcess | Running SampComp
2023-06-22T22:36:21.650716-0400 INFO - MainProcess | Checking and initialising SampComp
2023-06-22T22:36:21.653479-0400 INFO - MainProcess | Only 1 replicate found for condition wt
2023-06-22T22:36:21.653646-0400 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2023-06-22T22:36:21.653810-0400 INFO - MainProcess | Only 1 replicate found for condition ko
2023-06-22T22:36:21.654022-0400 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2023-06-22T22:36:25.091263-0400 INFO - MainProcess | Reading eventalign index files
2023-06-22T22:36:57.640431-0400 INFO - MainProcess |    References found in index: 25646
2023-06-22T22:36:57.641048-0400 INFO - MainProcess | Filtering out references with low coverage
2023-06-22T22:37:00.087471-0400 INFO - MainProcess |    References remaining after reference coverage filtering: 1769
2023-06-22T22:37:00.536628-0400 INFO - MainProcess | Starting data processing
2023-06-23T07:58:42.307276-0400 ERROR - Process-9 | Index and data files are not matching:
['']
OrderedDict([('ref_id', 'ENST00000398073'), ('ref_start', 1), ('ref_end', 806), ('read_id', '25c275fd-e3e2-44ae-a985-c7a921008786'), ('num_events', 2528), ('num_signals', 54601), ('dwell_time', 18.126699992455542), ('kmers', 806), ('missing_kmers', 35), ('NNNNN_kmers', 26), ('mismatch_kmers', 0), ('valid_kmers', 745), ('byte_offset', 11815044271), ('byte_len', 52838)])
2023-06-23T07:58:42.307653-0400 ERROR - Process-9 | Error in Worker
2023-06-23T07:58:42.342012-0400 ERROR - Process-9 | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 305, in __process_references
    raise NanocomporeError("Index and data files are not matching:\n{}\n{}".format(header, read))
nanocompore.common.NanocomporeError: Index and data files are not matching:
['']
OrderedDict([('ref_id', 'ENST00000398073'), ('ref_start', 1), ('ref_end', 806), ('read_id', '25c275fd-e3e2-44ae-a985-c7a921008786'), ('num_events', 2528), ('num_signals', 54601), ('dwell_time', 18.126699992455542), ('kmers', 806), ('missing_kmers', 35), ('NNNNN_kmers', 26), ('mismatch_kmers', 0), ('valid_kmers', 745), ('byte_offset', 11815044271), ('byte_len', 52838)])

2023-06-23T07:58:42.342946-0400 ERROR - MainProcess | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 305, in __process_references
    raise NanocomporeError("Index and data files are not matching:\n{}\n{}".format(header, read))
nanocompore.common.NanocomporeError: Index and data files are not matching:
['']
OrderedDict([('ref_id', 'ENST00000398073'), ('ref_start', 1), ('ref_end', 806), ('read_id', '25c275fd-e3e2-44ae-a985-c7a921008786'), ('num_events', 2528), ('num_signals', 54601), ('dwell_time', 18.126699992455542), ('kmers', 806), ('missing_kmers', 35), ('NNNNN_kmers', 26), ('mismatch_kmers', 0), ('valid_kmers', 745), ('byte_offset', 11815044271), ('byte_len', 52838)])

2023-06-23T07:58:42.343941-0400 ERROR - MainProcess | Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 305, in __process_references
    raise NanocomporeError("Index and data files are not matching:\n{}\n{}".format(header, read))
nanocompore.common.NanocomporeError: Index and data files are not matching:
['']
OrderedDict([('ref_id', 'ENST00000398073'), ('ref_start', 1), ('ref_end', 806), ('read_id', '25c275fd-e3e2-44ae-a985-c7a921008786'), ('num_events', 2528), ('num_signals', 54601), ('dwell_time', 18.126699992455542), ('kmers', 806), ('missing_kmers', 35), ('NNNNN_kmers', 26), ('mismatch_kmers', 0), ('valid_kmers', 745), ('byte_offset', 11815044271), ('byte_len', 52838)])

Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/bin/nanocompore", line 8, in <module>
2023-06-23T07:58:42.344284-0400 ERROR - MainProcess | An error occured. Killing all processes and closing queues

    sys.exit(main())
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/__main__.py", line 175, in main
    args.func(args)
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/__main__.py", line 213, in sampcomp_main
    db = s()
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 251, in __call__
    raise E
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 221, in __call__
    raise NanocomporeError(tb)
nanocompore.common.NanocomporeError: Traceback (most recent call last):
  File "/share/pkg/conda/nanocompore/1.0.4/lib/python3.6/site-packages/nanocompore/SampComp.py", line 305, in __process_references
    raise NanocomporeError("Index and data files are not matching:\n{}\n{}".format(header, read))
nanocompore.common.NanocomporeError: Index and data files are not matching:
['']
OrderedDict([('ref_id', 'ENST00000398073'), ('ref_start', 1), ('ref_end', 806), ('read_id', '25c275fd-e3e2-44ae-a985-c7a921008786'), ('num_events', 2528), ('num_signals', 54601), ('dwell_time', 18.126699992455542), ('kmers', 806), ('missing_kmers', 35), ('NNNNN_kmers', 26), ('mismatch_kmers', 0), ('valid_kmers', 745), ('byte_offset', 11815044271), ('byte_len', 52838)])

Because I got my first error in nanocompore eventalign_collapse process saying OSError: [Errno 5] Input/output error, , firstly I checked wt_eventalign.txt and ko_eventalign.txt which were generated from the previous step, nanopolish eventalign. However, I am not sure if there's any issue in the output.

wt_eventalign.txt

contig  position        reference_kmer  read_name       strand  event_index     event_level_mean        event_stdv      event_length    model_kmer      model_mean      model_stdv      standardized_level      samples
ENST00000361390 1       TACCC   71576efc-add3-4fc3-823c-05628a65a835    t       1506    78.95   1.122   0.00930 TACCC   78.84   2.81    0.03    77.4855,77.7745,80.52,78.0635,79.7975,77.4855,77.919,78.0635,79.2195,78.9305,78.208,77.63,77.341,80.809,77.63,78.6415,79.2195,79.075,78.6415,79.942,80.3755,79.653,79.7975,79.075,78.3525,81.8205,78.497,80.6645
ENST00000361390 1       TACCC   71576efc-add3-4fc3-823c-05628a65a835    t       1507    78.83   1.662   0.01494 TACCC   78.84   2.81    -0.01   76.763,76.9075,79.364,79.653,76.9075,79.075,77.1965,77.7745,77.919,72.717,78.3525,76.6185,78.786,78.497,78.208,79.364,78.6415,77.4855,77.7745,78.497,78.3525,78.6415,78.0635,80.809,77.919,79.2195,78.497,79.942,77.1965,81.387,78.208,80.3755,78.9305,79.5085,81.2425,81.2425,79.653,78.6415,80.0865,79.5085,78.497,83.5545,78.786,80.231,82.254

ko_eventalign.txt.

contig  position        reference_kmer  read_name       strand  event_index     event_level_mean        event_stdv      event_length    model_kmer      model_mean      model_stdv      standardized_level      samples
ENST00000361390 1       TACCC   38cc1b08-a86d-4e1b-92d6-18e7506a3eee    t       310     78.13   2.096   0.00266 TACCC   78.84   2.81    -0.23   75.3311,76.4227,78.8787,79.6973,79.6973,78.06,81.3346,75.604
ENST00000361390 1       TACCC   38cc1b08-a86d-4e1b-92d6-18e7506a3eee    t       311     74.57   1.585   0.00232 TACCC   78.84   2.81    -1.37   74.376,76.5591,76.832,74.376,74.1032,72.0565,73.6938
ENST00000361390 2       ACCCA   38cc1b08-a86d-4e1b-92d6-18e7506a3eee    t       312     64.76   1.395   0.00299 ACCCA   66.30   2.07    -0.67   67.281,65.0979,63.4606,66.5988,64.2792,65.2344,63.1877,64.2792,63.4606
ENST00000361390 3       CCCAT   38cc1b08-a86d-4e1b-92d6-18e7506a3eee    t       313     71.24   2.687   0.00232 CCCAT   73.36   2.11    -0.91   71.6472,73.6938,74.9218,71.5107,71.7836,67.5539,67.5539

The following is the output from wt/out_eventalign_collapse.tsv.

#71576efc-add3-4fc3-823c-05628a65a835   ENST00000361390
ref_pos ref_kmer        num_events      num_signals     dwell_time      NNNNN_dwell_time        mismatch_dwell_time     status  median  mad
1       TACCC   3       85      0.028220000211149454    0.0     0.0     valid   78.497  1.0115051
2       ACCCA   3       66      0.021919999504461884    0.0     0.0     valid   64.625  1.0115013
3       CCCAT   2       40      0.013280000304803252    0.0     0.0     valid   72.78925        1.0114975
4       CCATG   4       37      0.012279999908059835    0.0     0.0     valid   80.9535 1.7340012
5       CATGG   1       44      0.01460999995470047     0.0     0.0     valid   81.31475        1.0114975
6       ATGGC   1       8       0.0026599999982863665   0.0     0.0     valid   95.62025        2.8899956
7       TGGCC   1       24      0.007969999685883522    0.0     0.0     valid   101.256 2.0229988
8       GGCCA   3       61      0.02025000029243529     0.0     0.0     valid   101.184 1.8789978
9       GCCAA   2       32      0.010620000073686242    0.0     0.0     valid   73.006  1.5894966
10      CCAAC   3       43      0.014269999926909804    0.0     0.0     valid   85.5775 1.1559982
11      CAACC   1       22      0.007300000172108412    0.0     0.0     valid   89.69575        1.300499
12      AACCT   2       19      0.006310000084340572    0.0     0.0     valid   78.786  2.312004
13      ACCTC   2       42      0.01394000044092536     0.0     0.0     valid   70.116  0.7947502
14      CCTCC   1       20      0.006639999803155661    0.0     0.0     valid   66.28675        1.0114975
15      CTCCT   1       12      0.003980000037699938    0.0     0.0     valid   69.9715 0.8670006
16      TCCTA   1       42      0.013939999975264072    0.0     0.0     valid   72.139  1.0837479
17      CCTAC   2       44      0.014610000187531114    0.0     0.0     valid   79.43625        1.1559982
18      CTACT   13      514     0.17063999944366515     0.0     0.0     valid   84.4215 1.4449997
19      TACTC   1       7       0.002319999970495701    0.0     0.0     valid   88.323  1.0114975
20      ACTCC   1       12      0.003980000037699938    0.0     0.0     valid   85.144  0.8670006
21      CTCCT   3       48      0.015940000070258975    0.0     0.0     valid   71.85   1.5895004
22      TCCTC   2       14      0.004650000017136335    0.0     0.0     valid   71.19975        0.6502495
23      CCTCA   4       41      0.013609999907203019    0.0     0.0     valid   74.74   0.86699677
24      CTCAT   2       18      0.005979999899864197    0.0     0.0     valid   82.3985 0.93925476
25      TCATT   2       35      0.011619999771937728    0.0     0.0     valid   84.7105 1.1560059
26      CATTG   1       12      0.003980000037699938    0.0     0.0     valid   85.21625        1.0115013
27      ATTGT   6       134     0.044490000465884805    0.016270000487565994    0.0     NNNNN   87.38375        4.1904984
30      GTACC   1       20      0.006639999803155661    0.0     0.0     valid   73.65625        0.6502533
31      TACCC   1       12      0.003980000037699938    0.0     0.0     valid   78.786  0.7947464

The following is the output from ko/out_eventalign_collapse.tsv.

#38cc1b08-a86d-4e1b-92d6-18e7506a3eee   ENST00000361390
ref_pos ref_kmer        num_events      num_signals     dwell_time      NNNNN_dwell_time        mismatch_dwell_time     status  median  mad
1       TACCC   2       15      0.004979999968782067    0.0     0.0     valid   76.4227 2.0466995
2       ACCCA   1       9       0.0029899999499320984   0.0     0.0     valid   64.2792 0.8187027
3       CCCAT   1       7       0.002319999970495701    0.0     0.0     valid   71.6472 2.0466003
4       CCATG   4       38      0.012610000092536211    0.0     0.0     valid   82.7673 1.5008965
5       CATGG   1       25      0.008299999870359898    0.0     0.0     valid   81.0618 0.955101
6       ATGGC   1       18      0.005979999899864197    0.0     0.0     valid   95.3201 1.7737503
7       TGGCC   1       27      0.008960000239312649    0.0     0.0     valid   103.438 2.3199997
8       GGCCA   1       22      0.007300000172108412    0.0     0.0     valid   98.45825        3.8204002
9       GCCAA   1       7       0.002319999970495701    0.0     0.0     valid   73.148  0.8186035
10      CCAAC   1       8       0.0026599999982863665   0.0     0.0     valid   84.7457 0.54574966
11      CAACC   2       44      0.014599999878555536    0.0     0.0     valid   89.58945        1.1597443
12      AACCT   3       39      0.012950000120326877    0.0     0.0     valid   81.6075 2.4560013
13      ACCTC   3       61      0.02025999967008829     0.0     0.0     valid   69.737  1.0914993
14      CCTCC   1       7       0.002319999970495701    0.0     0.0     valid   72.0565 1.5009003
15      CTCCT   1       8       0.0026599999982863665   0.0     0.0     valid   69.5323 0.6140022
16      TCCTA   1       13      0.00431999983265996     0.0     0.0     valid   73.4209 1.5009003
17      CCTAC   2       110     0.03651999915018678     0.0     0.0     valid   79.288  0.88690186
18      CTACT   1       52      0.017260000109672546    0.0     0.0     valid   83.3813 1.2279968
19      TACTC   2       30      0.009959999937564135    0.0     0.0     valid   86.38305        1.2280006
20      ACTCC   1       14      0.004650000017136335    0.0     0.0     valid   75.8087 1.0915489
21      CTCCT   5       55      0.018269999884068966    0.0     0.0     valid   72.0565 1.2279968
22      TCCTC   1       6       0.001990000018849969    0.0     0.0     valid   72.0565 0.6140022
23      CCTCA   3       67      0.022240000311285257    0.0     0.0     valid   74.5125 1.2280045
24      CTCAT   1       18      0.005979999899864197    0.0     0.0     valid   84.0635 0.682251
25      TCATT   1       6       0.001990000018849969    0.0     0.0     valid   88.49795        0.68224716
26      CATTG   1       10      0.0033199999015778303   0.0     0.0     valid   89.521255       0.9550934

Could you please give me advice to solve the error? Thank you so much for your help.

lmulroney commented 1 year ago

Hi @kwonej0617,

It looks like all of the I/O errors happened during a write command, and not a read command. First thing I would do is make sure you didn't run out of disc space... Based on the comments you're leaving on other rna modification detection tools, it seems like you might be processing the data a lot, and these files can get really big really fast.

If it's not that, and you wrote the output of nanopolish eventalign to stdout during an HPC job, you might have written the job summary stats to the eventalign output. Write a parser script to make sure that every line is formatted as you expect. I'm assuming that what you posted is not the entire file, but just the first 5-30 lines.

If it's not that, I'd recommend starting over from nanopolish index in an interactive job instead of a bash script. That way you can be sure and respond to errors more readily and explore the problems as they come up.

Logan