Closed lingolingolin closed 3 years ago
Hi @lingolingolin,
could you try running it again using the version in the devel
branch?
You should then get a log file in the output folder, if you could paste here the content that would help debugging the issue.
cheers! tom
Hi @Tom,
Thanks a lot for your prompt reply. But I don't see setup.py file in the devl branch. How to install it?
That's because we use poetry. If you install poetry you can the use poetry run
from the devel branch to directly run nanocompore or poetry build
to build a wheel file that you can then install with pip.
Hi @tleonardi Tom,
Same as before. It did not continue processing any data.
Here is what is included in the log file:
{
"package_name": "nanocompore",
"package_version": "1.0.0rc3-2",
"timestamp": "2020-11-04 11:22:52.123257",
"eventalign_fn_dict": {
"KO1": {
"KO1_1": "../ko1.fastq.events.collapse/out_eventalign_collapse.tsv"
},
"WT1": {
"WT1_1": "../wt1.fastq.events.collapse/out_eventalign_collapse.tsv"
}
},
"fasta_fn": "cds.ref.fasta",
"bed_fn": null,
"outpath": "ko1_vs_wt1_5k",
"outprefix": "out_",
"overwrite": true,
"comparison_methods": "GMM,KS,TT,MW",
"logit": true,
"allow_warnings": true,
"sequence_context": 2,
"sequence_context_weights": "uniform",
"min_coverage": 10,
"min_ref_length": 100,
"downsample_high_coverage": 5000,
"max_invalid_kmers_freq": 0.1,
"select_ref_id": [],
"exclude_ref_id": [],
"nthreads": 10,
"log_level": "info"
This time out_SampComp.db.dir
is also produced.
'__ref_id_list', (0, 6)
'__metadata', (512, 286)
Also, there are two db binary files.
out_SampComp.db.dat
and out_SampComp.db.bak
.
Let me know if you think i need to try other things in addition to this. Thanks.
Hi @lingolingolin, this is still from the stable version of nanocompore. If you use the version in the devel branch, the log file will say "package_version": "1.0.0rc3-1-dev". Also, the log file will contain much more information.
Hi @tleonardi , sorry, i did not checkout. Now the log info from devel is attached. [Uploading out_SampComp.log…]()
It's wired that i found the required fields are all there though it complained about that.
awk 'NF!=8 && !/^#/' out_eventalign_collapse.tsv | wc -l
0
Hi @lingolingolin
I'm sorry but I don't understand.
Do you get a out_eventalign_collapse.tsv
file?
If so, it looks like Nanocompore's execution completed successfully. Can you paste the first 10 lines of that file?
Also, the log file wasn't attached properly to your previous message..
Hi @lingolingolin I'm sorry but I don't understand. Do you get a
out_eventalign_collapse.tsv
file? If so, it looks like Nanocompore's execution completed successfully. Can you paste the first 10 lines of that file? Also, the log file wasn't attached properly to your previous message..
Sorry again, it shows it is still in the process of uploading. Yes I have them for both samples. First few lines:
{
"package_name": "nanocompore",
"package_version": "1.0.0rc3-1-dev",
"timestamp": "2020-11-04 14:36:46.969158",
"eventalign_fn_dict": {
"KO1": {
"KO1_1": "../ko1.fastq.events.collapse/out_eventalign_collapse.tsv"
},
"WT1": {
"WT1_1": "../wt1.fastq.events.collapse/out_eventalign_collapse.tsv"
}
},
"fasta_fn": "cds.ref.fasta",
"bed_fn": null,
"outpath": "ko1_vs_wt1_5k",
"outprefix": "out_",
"overwrite": true,
"comparison_methods": "GMM,KS,TT,MW",
"logit": true,
"allow_warnings": true,
"sequence_context": 2,
"sequence_context_weights": "uniform",
"min_coverage": 10,
"min_ref_length": 100,
"downsample_high_coverage": 5000,
"max_invalid_kmers_freq": 0.1,
"select_ref_id": [],
"exclude_ref_id": [],
"nthreads": 10,
"log_level": "info"
}2020-11-04T14:36:46.995574+0100 INFO - MainProcess | Initialising SampComp and checking options
2020-11-04T14:36:46.996477+0100 INFO - MainProcess | Only 1 replicate found for condition KO1
2020-11-04T14:36:46.996984+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2020-11-04T14:36:46.997493+0100 INFO - MainProcess | Only 1 replicate found for condition WT1
2020-11-04T14:36:46.997976+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2020-11-04T14:36:47.002032+0100 DEBUG - MainProcess | OrderedDict([('KO1', {'KO1_1': '../ko1.fastq.events.collapse/out_eventalign_collapse.tsv'}), ('WT1', {'WT1_1': '../wt1.fastq.events.collapse/out_eventalign_collapse.tsv'})])
Last few lines:
2020-11-04T14:37:12.868905+0100 DEBUG - Process-8 | Worker thread processing new item from in_q: YAL054C
2020-11-04T14:37:12.909923+0100 ERROR - Process-7 | Error in worker. Kill output queue
2020-11-04T14:37:12.909987+0100 ERROR - Process-4 | Error in worker. Kill output queue
2020-11-04T14:37:12.910417+0100 ERROR - Process-7 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.910443+0100 ERROR - Process-4 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.914723+0100 ERROR - Process-6 | Error in worker. Kill output queue
2020-11-04T14:37:12.915235+0100 ERROR - Process-6 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.926125+0100 ERROR - Process-5 | Error in worker. Kill output queue
2020-11-04T14:37:12.928039+0100 ERROR - Process-5 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.950182+0100 ERROR - Process-9 | Error in worker. Kill output queue
2020-11-04T14:37:12.950741+0100 ERROR - Process-9 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
2020-11-04T14:37:12.963056+0100 ERROR - Process-8 | Error in worker. Kill output queue
2020-11-04T14:37:12.963575+0100 ERROR - Process-8 | Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']
[Uploading nanocompore.out_SampComp.txt.log…]()
ok, thanks! looks like there's something wrong with the input files. Can you post the commands (and versions) you used for nanopolish and nanopolishcomp? Can you also post the first few lines of out_eventalign_collapse.tsv and out_eventalign_collapse.tsv.idx?
Nanopolish commands:
nanopolish eventalign --reads ../WT1.fastq --bam WT1.bam --genome ref.fasta --scale-events -t 10 --summary=WT1.event.aln.summary.txt --print-read-names --signal-index > WT1.evn.aln.tsv
nanopolishcomp command
NanopolishComp Eventalign_collapse -t 12 -i WT1.evn.aln.tsv -o WTs.event1.collapse
out_eventalign_collapse.tsv
#588acc31-711c-434c-8b0a-01bbe036064d YAL053W
ref_pos ref_kmer num_events dwell_time NNNNN_dwell_time mismatch_dwell_time start_idx end_idx
1 GATCT 3 0.01826 0.0 0.0 109753 109808
2 ATCTT 1 0.02457 0.0 0.0 109679 109753
3 TCTTC 1 0.00631 0.0 0.0 109660 109679
4 CTTCC 2 0.01428 0.0 0.0 109617 109660
5 TTCCT 1 0.00498 0.0 0.0 109602 109617
6 TCCTA 4 0.03686 0.0 0.0 109491 109602
7 CCTAA 2 0.00763 0.0 0.0 109468 109491
8 CTAAA 2 0.010960000000000001 0.0 0.0 109435 109468
9 TAAAC 1 0.00598 0.0 0.0 109417 109435
10 AAACA 1 0.00996 0.0 0.0 109387 109417
11 AACAC 1 0.00398 0.0 0.0 109375 109387
12 ACACC 3 0.01593 0.0 0.0 109327 109375
13 CACCT 1 0.00398 0.0 0.0 109315 109327
14 ACCTT 1 0.01062 0.0 0.0 109283 109315
15 CCTTC 1 0.0073 0.0 0.0 109261 109283
16 CTTCG 6 0.033859999999999994 0.0 0.0 109159 109261
17 TTCGC 2 0.018260000000000002 0.0 0.0 109104 109159
18 TCGCA 1 0.00465 0.0 0.0 109090 109104
19 CGCAA 3 0.01461 0.0 0.0 109046 109090
20 GCAAG 2 0.00597 0.0 0.0 109028 109046
21 CAAGG 1 0.0176 0.0 0.0 108975 109028
22 AAGGT 2 0.02988 0.02556 0.0 108885 108975
25 GTGCC 1 0.0073 0.0 0.0 108863 108885
26 TGCCT 4 0.01893 0.0 0.0 108806 108863
27 GCCTT 1 0.00531 0.0 0.0 108790 108806
28 CCTTT 1 0.00432 0.0 0.0 108777 108790
29 CTTTT 1 0.01428 0.0 0.0 108734 108777
and out_eventalign_collapse.tsv.idx
ref_id ref_start ref_end read_id kmers dwell_time NNNNN_kmers mismatch_kmers missing_kmers byte_offset byte_len
YAL053W 1 2348 588acc31-711c-434c-8b0a-01bbe036064d 2276 32.349429999999984 63 0 72 0 96768
YAL053W 1 2348 39cfdeba-4504-43f5-b029-0f02e64e1b90 2225 41.382659999999895 87 0 123 96769 97635
YAL053W 0 2348 b5b50209-9924-436b-9010-079e5868f553 2246 40.308319999999945 78 0 102 194405 96664
YAL053W 1 2321 aa7a155c-d3c5-46a3-9398-7264d457ae9d 2240 26.048969999999958 66 0 80 291070 93905
YAL053W 0 2348 247e95b9-13a1-4689-875f-348752380f60 2269 35.96605999999993 61 0 79 384976 97460
YAL053W 6 2348 5406092e-da70-4e67-8e7f-2dbe6ae73b6d 2284 39.16124000000003 52 0 58 482437 98977
YAL053W 20 2348 362e37d6-a6c6-4d7c-abe9-bc6698629d70 2256 33.95646999999993 60 0 72 581415 96782
YAL053W 1 2348 6ebb1a76-60db-4319-a7f0-55de8e410e5c 2231 51.74043999999983 80 0 116 678198 97957
YAL053W 722 2346 969c5fa9-9b06-48fe-bf18-e3b37fe44882 1586 23.10885000000002 31 0 38 776156 67930
YAL053W 367 2348 ba7b78f3-be88-4be6-a7dc-690fac0e361f 1923 23.00586 45 0 58 844087 81206
YAL053W 2 2348 7d16aced-9113-4d19-88d0-acd3df2c9873 2209 61.345379999999885 92 0 141 925294 97899
YAL053W 135 2268 f975d5bf-2d3d-4cc7-8c8b-a4c9e255d48d 1871 31.69787000000004 87 0 270 1023194 81295
YAL053W 272 2340 1a7f6325-318a-420f-8318-f51f4395f418 1991 32.717500000000044 71 0 77 1104490 84869
YAL053W 605 2348 e5330606-b26b-4e22-8749-d1d0b0555d82 1713 26.567679999999992 35 0 31 1189360 73406
YAL053W 125 2347 59cc87d3-71ed-4d88-b701-fac6eb7ea77f 2125 44.506640000000026 67 0 97 1262767 93184
YAL053W 723 2348 9f363448-a3d0-4a00-af10-ce42f54669ae 1553 21.441729999999982 42 0 81 1355952 66377
YAL053W 339 2348 0c9c937d-04aa-4246-9f93-7a474eae44f6 1937 38.876350000000045 60 0 80 1422330 83892
YAL053W 6 2326 ff974ccf-e7ee-43ad-baab-ee61f1469627 2173 71.17765000000003 118 0 150 1506223 97433
YAL053W 816 2348 e0069090-f8a6-4791-b7a8-d13a018e89b0 1486 25.01625000000003 39 0 54 1603657 63604
YAL053W 916 2324 cb15e055-8cff-48f3-92a3-86d49e19f0ac 1376 18.400750000000002 34 0 33 1667262 59065
Hi @tleonardi , sorry to bother you. But is there any update on this?
Hi @lingolingolin. This doesn't appear to be a NanoCompore issue but rather a NanopolishComp one. Can you verify that you are using the last version of NanopolishComp (0.6.11), and if you still have the problem, open an issue describing the bug in detail in NanopolishComp Thanks
Hi @a-slide ,
Thanks.
Why is it a NanopolishCopm issue? NanopolishComp
ran smoothly and now my analysis is stopped at the NanoCompore
stage.
And the NanopolishComp version is indeed what you suggested.
NanopolishComp --version
NanopolishComp v0.6.11
Sorry I mixed up with another open issue. I will have a look.
thanks a lot @a-slide :-)
Actually I think the median intensity value is missing from the eventalignCollapse file
Is that a must included column?
Is there anything wrong with my nanopolish eventalign command?
According to the error message from nanocompore
, the required columns do not include median intensity value
and my input file include those required columns.
I believe this is because you have not prepared the data as explained in the comprehensive Nanocompore documentation. You are supposed to use the --samples option in Nanopolish
From the documentation on how to prepare your data (https://nanocompore.rna.rocks/data_preparation/):
nanopolish index -s {sequencing_summary.txt} -d {raw_fast5_dir} {basecalled_fastq}
nanopolish eventalign --reads {basecalled_fastq} --bam {aligned_reads_bam} --genome {transcriptome_fasta} --print-read-names --scale-events --samples > {eventalign_reads_tsv}
NanopolishComp Eventalign_collapse -i {eventalign_reads_tsv} -o {eventalign_collapsed_reads_tsv}
As mentioned in the CONTRIBUTING guidelines, please read the documentation before raising an issue.
Hi @a-slide ,
I wonder if --samples must be switched on?
I also wonder if median intensity value
must be included ?
I asked these because when i extracted information associated with single genes. It worked.
Hi There,
This is actually an old issue. I am running nanocompore, it seems it is freezed.
ps x
shows the processes status as below.so far, the message print out to screen is
Can you help to sort it out? Thanks a lot in advance.