tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

nanocompore suspended #154

Closed lingolingolin closed 3 years ago

lingolingolin commented 3 years ago

Hi There,

This is actually an old issue. I am running nanocompore, it seems it is freezed.

ps x shows the processes status as below.

35514 ?        Sl     0:22 /nfs/users2/ekis/hlingo/mygit/nanocompare/bin/python /nfs/users2/ekis/hlingo/mygit/nanocompare/bin/nanocompore sampcomp --label1 KO --label2 WT --file_list1 ../ko.fastq.events.collapse/out_eventalign_collapse.tsv --file_list2 ../wt.fastq.events.collapse/out_eventalign_collapse.tsv --min_coverage 10 --outpath ko_vs_wt --overwrite -t 10 --sequence_context 2 --pvalue_thr 0.2 --fasta cds.ref.fasta --comparison_methods GMM,KS,TT,MW --logit --allow_warnings --downsample_high_coverage 5000
36011 ?        Sl     0:00 /nfs/users2/ekis/hlingo/mygit/nanocompare/bin/python /nfs/users2/ekis/hlingo/mygit/nanocompare/bin/nanocompore sampcomp --label1 KO --label2 WT --file_list1 ../ko.fastq.events.collapse/out_eventalign_collapse.tsv --file_list2 ../wt.fastq.events.collapse/out_eventalign_collapse.tsv --min_coverage 10 --outpath ko_vs_wt --overwrite -t 10 --sequence_context 2 --pvalue_thr 0.2 --fasta cds.ref.fasta --comparison_methods GMM,KS,TT,MW --logit --allow_warnings --downsample_high_coverage 5000
36014 ?        Z      0:00 [nanocompore] <defunct>
36016 ?        Z      0:00 [nanocompore] <defunct>
36018 ?        Z      0:00 [nanocompore] <defunct>
36020 ?        Z      0:00 [nanocompore] <defunct>
36022 ?        Z      0:00 [nanocompore] <defunct>
36024 ?        Z      0:00 [nanocompore] <defunct>
36026 ?        Z      0:00 [nanocompore] <defunct>
36031 ?        Z      0:00 [nanocompore] <defunct>
36035 ?        Z      0:00 [nanocompore] <defunct>
63739 pts/0    S+     0:00 grep --color=auto nano

so far, the message print out to screen is

Initialising SampComp and checking options
Initialising Whitelist and checking options
Reading eventalign index files
        References found in index: 5817
Filtering out references with low coverage
        References remaining after reference coverage filtering: 2055
Starting data processing
  0%|          | 0/2055 [00:00<?, ? Processed References/s]

Can you help to sort it out? Thanks a lot in advance.

tleonardi commented 3 years ago

Hi @lingolingolin, could you try running it again using the version in the devel branch? You should then get a log file in the output folder, if you could paste here the content that would help debugging the issue.

cheers! tom

lingolingolin commented 3 years ago

Hi @Tom,

Thanks a lot for your prompt reply. But I don't see setup.py file in the devl branch. How to install it?

tleonardi commented 3 years ago

That's because we use poetry. If you install poetry you can the use poetry run from the devel branch to directly run nanocompore or poetry build to build a wheel file that you can then install with pip.

lingolingolin commented 3 years ago

Hi @tleonardi Tom,

Same as before. It did not continue processing any data.

Here is what is included in the log file:

{
  "package_name": "nanocompore",
  "package_version": "1.0.0rc3-2",
  "timestamp": "2020-11-04 11:22:52.123257",
  "eventalign_fn_dict": {
    "KO1": {
      "KO1_1": "../ko1.fastq.events.collapse/out_eventalign_collapse.tsv"
    },
    "WT1": {
      "WT1_1": "../wt1.fastq.events.collapse/out_eventalign_collapse.tsv"
    }
  },
  "fasta_fn": "cds.ref.fasta",
  "bed_fn": null,
  "outpath": "ko1_vs_wt1_5k",
  "outprefix": "out_",
  "overwrite": true,
  "comparison_methods": "GMM,KS,TT,MW",
  "logit": true,
  "allow_warnings": true,
  "sequence_context": 2,
  "sequence_context_weights": "uniform",
  "min_coverage": 10,
  "min_ref_length": 100,
  "downsample_high_coverage": 5000,
  "max_invalid_kmers_freq": 0.1,
  "select_ref_id": [],
  "exclude_ref_id": [],
  "nthreads": 10,
  "log_level": "info"

This time out_SampComp.db.dir is also produced.

'__ref_id_list', (0, 6)
'__metadata', (512, 286)

Also, there are two db binary files. out_SampComp.db.dat and out_SampComp.db.bak.

Let me know if you think i need to try other things in addition to this. Thanks.

tleonardi commented 3 years ago

Hi @lingolingolin, this is still from the stable version of nanocompore. If you use the version in the devel branch, the log file will say "package_version": "1.0.0rc3-1-dev". Also, the log file will contain much more information.

lingolingolin commented 3 years ago

Hi @tleonardi , sorry, i did not checkout. Now the log info from devel is attached. [Uploading out_SampComp.log…]()

It's wired that i found the required fields are all there though it complained about that.

awk 'NF!=8 && !/^#/' out_eventalign_collapse.tsv | wc -l
0
tleonardi commented 3 years ago

Hi @lingolingolin I'm sorry but I don't understand. Do you get a out_eventalign_collapse.tsv file? If so, it looks like Nanocompore's execution completed successfully. Can you paste the first 10 lines of that file? Also, the log file wasn't attached properly to your previous message..

lingolingolin commented 3 years ago

Hi @lingolingolin I'm sorry but I don't understand. Do you get a out_eventalign_collapse.tsv file? If so, it looks like Nanocompore's execution completed successfully. Can you paste the first 10 lines of that file? Also, the log file wasn't attached properly to your previous message..

Sorry again, it shows it is still in the process of uploading. Yes I have them for both samples. First few lines:

{
  "package_name": "nanocompore",
  "package_version": "1.0.0rc3-1-dev",
  "timestamp": "2020-11-04 14:36:46.969158",
  "eventalign_fn_dict": {
    "KO1": {
      "KO1_1": "../ko1.fastq.events.collapse/out_eventalign_collapse.tsv"
    },
    "WT1": {
      "WT1_1": "../wt1.fastq.events.collapse/out_eventalign_collapse.tsv"
    }
  },
  "fasta_fn": "cds.ref.fasta",
  "bed_fn": null,
  "outpath": "ko1_vs_wt1_5k",
  "outprefix": "out_",
  "overwrite": true,
  "comparison_methods": "GMM,KS,TT,MW",
  "logit": true,
  "allow_warnings": true,
  "sequence_context": 2,
  "sequence_context_weights": "uniform",
  "min_coverage": 10,
  "min_ref_length": 100,
  "downsample_high_coverage": 5000,
  "max_invalid_kmers_freq": 0.1,
  "select_ref_id": [],
  "exclude_ref_id": [],
  "nthreads": 10,
  "log_level": "info"
}2020-11-04T14:36:46.995574+0100 INFO - MainProcess | Initialising SampComp and checking options
2020-11-04T14:36:46.996477+0100 INFO - MainProcess | Only 1 replicate found for condition KO1
2020-11-04T14:36:46.996984+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2020-11-04T14:36:46.997493+0100 INFO - MainProcess | Only 1 replicate found for condition WT1
2020-11-04T14:36:46.997976+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2020-11-04T14:36:47.002032+0100 DEBUG - MainProcess | OrderedDict([('KO1', {'KO1_1': '../ko1.fastq.events.collapse/out_eventalign_collapse.tsv'}), ('WT1', {'WT1_1': '../wt1.fastq.events.collapse/out_eventalign_collapse.tsv'})])

Last few lines:

2020-11-04T14:37:12.868905+0100 DEBUG - Process-8 | Worker thread processing new item from in_q: YAL054C
2020-11-04T14:37:12.909923+0100 ERROR - Process-7 | Error in worker. Kill output queue
2020-11-04T14:37:12.909987+0100 ERROR - Process-4 | Error in worker. Kill output queue
2020-11-04T14:37:12.910417+0100 ERROR - Process-7 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.910443+0100 ERROR - Process-4 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.914723+0100 ERROR - Process-6 | Error in worker. Kill output queue
2020-11-04T14:37:12.915235+0100 ERROR - Process-6 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.926125+0100 ERROR - Process-5 | Error in worker. Kill output queue
2020-11-04T14:37:12.928039+0100 ERROR - Process-5 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.950182+0100 ERROR - Process-9 | Error in worker. Kill output queue
2020-11-04T14:37:12.950741+0100 ERROR - Process-9 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.963056+0100 ERROR - Process-8 | Error in worker. Kill output queue
2020-11-04T14:37:12.963575+0100 ERROR - Process-8 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']

[Uploading nanocompore.out_SampComp.txt.log…]()

tleonardi commented 3 years ago

ok, thanks! looks like there's something wrong with the input files. Can you post the commands (and versions) you used for nanopolish and nanopolishcomp? Can you also post the first few lines of out_eventalign_collapse.tsv and out_eventalign_collapse.tsv.idx?

lingolingolin commented 3 years ago

Nanopolish commands:

nanopolish eventalign --reads ../WT1.fastq --bam WT1.bam --genome ref.fasta --scale-events -t 10 --summary=WT1.event.aln.summary.txt --print-read-names --signal-index > WT1.evn.aln.tsv

nanopolishcomp command

NanopolishComp Eventalign_collapse -t 12 -i WT1.evn.aln.tsv -o WTs.event1.collapse

out_eventalign_collapse.tsv

#588acc31-711c-434c-8b0a-01bbe036064d   YAL053W
ref_pos ref_kmer        num_events      dwell_time      NNNNN_dwell_time        mismatch_dwell_time     start_idx       end_idx
1       GATCT   3       0.01826 0.0     0.0     109753  109808
2       ATCTT   1       0.02457 0.0     0.0     109679  109753
3       TCTTC   1       0.00631 0.0     0.0     109660  109679
4       CTTCC   2       0.01428 0.0     0.0     109617  109660
5       TTCCT   1       0.00498 0.0     0.0     109602  109617
6       TCCTA   4       0.03686 0.0     0.0     109491  109602
7       CCTAA   2       0.00763 0.0     0.0     109468  109491
8       CTAAA   2       0.010960000000000001    0.0     0.0     109435  109468
9       TAAAC   1       0.00598 0.0     0.0     109417  109435
10      AAACA   1       0.00996 0.0     0.0     109387  109417
11      AACAC   1       0.00398 0.0     0.0     109375  109387
12      ACACC   3       0.01593 0.0     0.0     109327  109375
13      CACCT   1       0.00398 0.0     0.0     109315  109327
14      ACCTT   1       0.01062 0.0     0.0     109283  109315
15      CCTTC   1       0.0073  0.0     0.0     109261  109283
16      CTTCG   6       0.033859999999999994    0.0     0.0     109159  109261
17      TTCGC   2       0.018260000000000002    0.0     0.0     109104  109159
18      TCGCA   1       0.00465 0.0     0.0     109090  109104
19      CGCAA   3       0.01461 0.0     0.0     109046  109090
20      GCAAG   2       0.00597 0.0     0.0     109028  109046
21      CAAGG   1       0.0176  0.0     0.0     108975  109028
22      AAGGT   2       0.02988 0.02556 0.0     108885  108975
25      GTGCC   1       0.0073  0.0     0.0     108863  108885
26      TGCCT   4       0.01893 0.0     0.0     108806  108863
27      GCCTT   1       0.00531 0.0     0.0     108790  108806
28      CCTTT   1       0.00432 0.0     0.0     108777  108790
29      CTTTT   1       0.01428 0.0     0.0     108734  108777

and out_eventalign_collapse.tsv.idx

ref_id  ref_start       ref_end read_id kmers   dwell_time      NNNNN_kmers     mismatch_kmers  missing_kmers   byte_offset     byte_len
YAL053W 1       2348    588acc31-711c-434c-8b0a-01bbe036064d    2276    32.349429999999984      63      0       72      0       96768
YAL053W 1       2348    39cfdeba-4504-43f5-b029-0f02e64e1b90    2225    41.382659999999895      87      0       123     96769   97635
YAL053W 0       2348    b5b50209-9924-436b-9010-079e5868f553    2246    40.308319999999945      78      0       102     194405  96664
YAL053W 1       2321    aa7a155c-d3c5-46a3-9398-7264d457ae9d    2240    26.048969999999958      66      0       80      291070  93905
YAL053W 0       2348    247e95b9-13a1-4689-875f-348752380f60    2269    35.96605999999993       61      0       79      384976  97460
YAL053W 6       2348    5406092e-da70-4e67-8e7f-2dbe6ae73b6d    2284    39.16124000000003       52      0       58      482437  98977
YAL053W 20      2348    362e37d6-a6c6-4d7c-abe9-bc6698629d70    2256    33.95646999999993       60      0       72      581415  96782
YAL053W 1       2348    6ebb1a76-60db-4319-a7f0-55de8e410e5c    2231    51.74043999999983       80      0       116     678198  97957
YAL053W 722     2346    969c5fa9-9b06-48fe-bf18-e3b37fe44882    1586    23.10885000000002       31      0       38      776156  67930
YAL053W 367     2348    ba7b78f3-be88-4be6-a7dc-690fac0e361f    1923    23.00586        45      0       58      844087  81206
YAL053W 2       2348    7d16aced-9113-4d19-88d0-acd3df2c9873    2209    61.345379999999885      92      0       141     925294  97899
YAL053W 135     2268    f975d5bf-2d3d-4cc7-8c8b-a4c9e255d48d    1871    31.69787000000004       87      0       270     1023194 81295
YAL053W 272     2340    1a7f6325-318a-420f-8318-f51f4395f418    1991    32.717500000000044      71      0       77      1104490 84869
YAL053W 605     2348    e5330606-b26b-4e22-8749-d1d0b0555d82    1713    26.567679999999992      35      0       31      1189360 73406
YAL053W 125     2347    59cc87d3-71ed-4d88-b701-fac6eb7ea77f    2125    44.506640000000026      67      0       97      1262767 93184
YAL053W 723     2348    9f363448-a3d0-4a00-af10-ce42f54669ae    1553    21.441729999999982      42      0       81      1355952 66377
YAL053W 339     2348    0c9c937d-04aa-4246-9f93-7a474eae44f6    1937    38.876350000000045      60      0       80      1422330 83892
YAL053W 6       2326    ff974ccf-e7ee-43ad-baab-ee61f1469627    2173    71.17765000000003       118     0       150     1506223 97433
YAL053W 816     2348    e0069090-f8a6-4791-b7a8-d13a018e89b0    1486    25.01625000000003       39      0       54      1603657 63604
YAL053W 916     2324    cb15e055-8cff-48f3-92a3-86d49e19f0ac    1376    18.400750000000002      34      0       33      1667262 59065
lingolingolin commented 3 years ago

Hi @tleonardi , sorry to bother you. But is there any update on this?

a-slide commented 3 years ago

Hi @lingolingolin. This doesn't appear to be a NanoCompore issue but rather a NanopolishComp one. Can you verify that you are using the last version of NanopolishComp (0.6.11), and if you still have the problem, open an issue describing the bug in detail in NanopolishComp Thanks

lingolingolin commented 3 years ago

Hi @a-slide , Thanks. Why is it a NanopolishCopm issue? NanopolishComp ran smoothly and now my analysis is stopped at the NanoCompore stage. And the NanopolishComp version is indeed what you suggested.

NanopolishComp --version
NanopolishComp v0.6.11
a-slide commented 3 years ago

Sorry I mixed up with another open issue. I will have a look.

lingolingolin commented 3 years ago

thanks a lot @a-slide :-)

a-slide commented 3 years ago

Actually I think the median intensity value is missing from the eventalignCollapse file

lingolingolin commented 3 years ago

Is that a must included column? Is there anything wrong with my nanopolish eventalign command? According to the error message from nanocompore , the required columns do not include median intensity value and my input file include those required columns.

a-slide commented 3 years ago

I believe this is because you have not prepared the data as explained in the comprehensive Nanocompore documentation. You are supposed to use the --samples option in Nanopolish

From the documentation on how to prepare your data (https://nanocompore.rna.rocks/data_preparation/):

nanopolish index -s {sequencing_summary.txt} -d {raw_fast5_dir} {basecalled_fastq}

nanopolish eventalign --reads {basecalled_fastq} --bam {aligned_reads_bam} --genome {transcriptome_fasta} --print-read-names --scale-events --samples > {eventalign_reads_tsv}

NanopolishComp Eventalign_collapse -i {eventalign_reads_tsv} -o {eventalign_collapsed_reads_tsv}

As mentioned in the CONTRIBUTING guidelines, please read the documentation before raising an issue.

lingolingolin commented 3 years ago

Hi @a-slide , I wonder if --samples must be switched on? I also wonder if median intensity value must be included ? I asked these because when i extracted information associated with single genes. It worked.