morispi / CONSENT

Scalable long read self-correction and assembly polishing with multiple sequence alignment
https://doi.org/10.1038/s41598-020-80757-5
GNU Affero General Public License v3.0
55 stars 5 forks source link

CONSENT-correct problem #9

Closed WeipengMO closed 5 years ago

WeipengMO commented 5 years ago

Hi,

I got some problem when running CONSENT-correct.

My error log is shown below:

Self-aligning the long reads (minimap2)
[M::mm_idx_gen::12.180*1.76] collected minimizers
[M::mm_idx_gen::13.815*3.11] sorted minimizers
[M::main::13.815*3.11] loaded/built the index for 262860 target sequence(s)
[M::mm_mapopt_update::14.927*2.95] mid_occ = 599
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 262860
[M::mm_idx_stat::15.697*2.86] distinct minimizers: 58840350 (70.40% are singletons); average occurrences: 2.953; average spacing: 2.877
[M::worker_pipeline::1241.866*2.38] mapped 262855 sequences
[M::worker_pipeline::2840.264*1.57] mapped 257111 sequences
thread 'main' panicked at 'Trouble during read of input: Error(Deserialize { pos: Some(Position { byte: 135947718566, line: 872754811, record: 872754810 }), err: DeserializeError { field: None, kind: Message("invalid length 6, expected a tuple of size 12") } })', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Cheers, Windz

morispi commented 5 years ago

Hello,

Seems like an issue with Minimap2 to me, that couldn't finish the overlapping job. Have you tried running Minimap2 alone on your data?

Cheers, Pierre

WeipengMO commented 5 years ago

Hi,

When I tried running minimap2 alone with the command line minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t16 m54061_190201_031423.fasta m54061_190201_031423.fasta > output.paf according to the code in CONSENT-correct, the minimap2 job was killed.

[M::mm_idx_gen::101.784*1.93] collected minimizers
[M::mm_idx_gen::113.037*2.97] sorted minimizers
[M::main::113.037*2.97] loaded/built the index for 2135528 target sequence(s)
[M::mm_mapopt_update::118.378*2.88] mid_occ = 1710
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 2135528
[M::mm_idx_stat::120.232*2.85] distinct minimizers: 200769403 (40.63% are singletons); average occurrences: 6.925; average spacing: 2.877
/var/spool/slurm/d/job596422/slurm_script: line 2: 19814 Killed                  ~/windz/software/CONSENT/minimap2/minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t16 m54061_190201_031423.fasta m54061_190201_031423.fasta > output.paf

Cheers, Windz

morispi commented 5 years ago

Hey,

You should probably open an issue on Minimap2 then: https://github.com/lh3/minimap2. Have you tried running Minimap2 on this same dataset without tuning the parameters?

Cheers, Pierre

WeipengMO commented 5 years ago

Hi,

I have run Minimap2 on the same dataset without the parameters like this minimap2 -x ava-pb -t20 m54061_190201_031423.fasta m54061_190201_031423.fasta > output.paf, and it worked properly.

I think there may be a problem with the parameters.

Thanks a lot

Cheers, Windz

morispi commented 5 years ago

Hi,

You should be able to run CONSENT directly from the mapping file you obtained with the default parameters then. Feel free to drop me another message on here if you can't manage to and want further help, like the precise command line to use, or whatever you need.

Still pretty weird that the parameters cause Minimap2 to crash. I'll consider opening an issue on Minimap2 a bit later, if you don't do it before me, to get insight as to what's wrong.

Cheers, Pierre

WeipengMO commented 5 years ago

Hi,

When I run these two programs separately:

minimap2 -x ava-pb -t20 m54061_190201_031423.fasta m54061_190201_031423.fasta > Alignments.paf
fpa Alignments.paf tmpdir/fpa.paf index -f tmpdir/PAFIndex.idx -t query

instead of using |:

minimap2 -x ava-pb -t20 m54061_190201_031423.fasta m54061_190201_031423.fasta | fpa - tmpdir/Alignments.paf index -f tmpdir/PAFIndex.idx -t query

It worked properly!

I guess if the output is too large, causing memory overflow. This is some details about the output:

total 415G
-rw-rw-r-- 1 pg3152 pg3152 208G Mar  9 15:30 Alignments.paf
-rw-rw-r-- 1 pg3152 pg3152 208G Mar  9 19:12 fpa.paf
-rw-rw-r-- 1 pg3152 pg3152 268M Mar  9 19:12 PAFIndex.idx

After that I ran CONSENT directly from the mapping file, it worked well !

Thanks a lot!

Cheers, Windz