mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
763 stars 165 forks source link

Error during consensus #40

Closed jonhultqvist closed 6 years ago

jonhultqvist commented 6 years ago

Hi,

I'm trying to assemble a eukaryotic genome of about 200-300 Mbp size, genome size was estimated from a miniasm assembly. Besides the eukaryote genome there are also a considerable amount of prokaryotic genomes associated (both endosymbiont and extracellular) with the data set. The total dataset is around 16 Gbp ONT reads. An appreciable amount of the data is realtively short so I decided to run with "--min-overlap 3000"

Launch-script

flye --nano-raw \
/scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq \
--genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume

Log-file, start

[2018-02-22 08:41:05] root: DEBUG: Genome size: 209715200
[2018-02-22 08:41:05] root: DEBUG: Chosen k-mer size: 17
[2018-02-22 08:41:05] root: INFO: Running Flye 2.3-release
[2018-02-22 08:41:05] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2
[2018-02-22 08:41:05] root: INFO: Assembling reads
[2018-02-22 08:41:05] root: DEBUG: -----Begin assembly log------
[2018-02-22 08:41:05] root: DEBUG: Running: flye-assemble -k 17 -l /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/flye.log -t 20 -v 3000 /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta 209715200 /scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg
[2018-02-22 08:41:05] DEBUG: Build date: Jan  8 2018 12:26:55
[2018-02-22 08:41:05] DEBUG: Parameters:
[2018-02-22 08:41:05] DEBUG:    maximum_jump=1500
[2018-02-22 08:41:05] DEBUG:    maximum_overhang=1500
[2018-02-22 08:41:05] DEBUG:    hard_min_coverage_rate=10
[2018-02-22 08:41:05] DEBUG:    repeat_coverage_rate=10
[2018-02-22 08:41:05] DEBUG:    close_jump_rate=100
[2018-02-22 08:41:05] DEBUG:    far_jump_rate=2
[2018-02-22 08:41:05] DEBUG:    overlap_divergence_rate=5
[2018-02-22 08:41:05] DEBUG:    penalty_window=100
[2018-02-22 08:41:05] DEBUG:    max_coverage_drop_rate=5
[2018-02-22 08:41:05] DEBUG:    chimera_window=100
[2018-02-22 08:41:05] DEBUG:    min_reads_in_contig=4
[2018-02-22 08:41:05] DEBUG:    max_inner_reads=10
[2018-02-22 08:41:05] DEBUG:    max_inner_fraction=0.25
[2018-02-22 08:41:05] DEBUG:    max_separation=500
[2018-02-22 08:41:05] DEBUG:    tip_length_threshold=20000
[2018-02-22 08:41:05] DEBUG:    unique_edge_length=50000
[2018-02-22 08:41:05] DEBUG:    min_repeat_res_support=0.5
[2018-02-22 08:41:05] DEBUG:    out_paths_ratio=5
[2018-02-22 08:41:05] DEBUG:    graph_cov_drop_rate=10
[2018-02-22 08:41:05] DEBUG:    coverage_estimate_window=100
[2018-02-22 08:41:05] DEBUG:    low_cutoff_warning=1
[2018-02-22 08:41:05] DEBUG:    assemble_kmer_sample=1
[2018-02-22 08:41:05] DEBUG:    assemble_gap=500
[2018-02-22 08:41:05] DEBUG:    repeat_graph_kmer_sample=5
[2018-02-22 08:41:05] DEBUG:    repeat_graph_gap=100
[2018-02-22 08:41:05] DEBUG:    repeat_graph_max_kmer=500
[2018-02-22 08:41:05] DEBUG:    read_align_kmer_sample=1
[2018-02-22 08:41:05] DEBUG:    read_align_gap=500
[2018-02-22 08:41:05] DEBUG:    read_align_max_kmer=500
[2018-02-22 08:41:05] INFO: Reading sequences
[2018-02-22 10:17:47] DEBUG: Mean read length: 3639
[2018-02-22 10:17:47] DEBUG: Estimated coverage: 69
[2018-02-22 10:17:47] INFO: Generating solid k-mer index
[2018-02-22 10:17:47] DEBUG: Hard threshold set to 7
[2018-02-22 10:17:47] DEBUG: Started kmer counting
[2018-02-22 10:28:35] INFO: Counting kmers (1/2):
[2018-02-22 10:32:57] INFO: Counting kmers (2/2):
[2018-02-22 10:44:30] DEBUG: Filtered 363871 repetitive kmers
[2018-02-22 10:44:30] DEBUG: Estimated minimum kmer coverage: 10, 206931346 unique kmers selected
[2018-02-22 10:44:30] INFO: Filling index table
[2018-02-22 10:44:38] DEBUG: Solid kmers: 206931346
[2018-02-22 10:44:38] DEBUG: Kmer index size: 6149339332
[2018-02-22 11:02:03] DEBUG: Total chunks 1467 wasted space: 71130
[2018-02-22 11:11:41] INFO: Extending reads
[2018-02-22 11:17:22] DEBUG: Mean read coverage: 53
[2018-02-22 11:23:31] DEBUG: Assembled contig 1

Log-file end

[2018-02-23 01:46:19] DEBUG: Inner: 737088 covered: 1174293 total: 8006938
[2018-02-23 01:47:02] DEBUG: Discarded contig with 7 reads and 2 inner overlaps
[2018-02-23 01:49:04] INFO: Assembled 1496 draft contigs
[2018-02-23 01:49:11] INFO: Generating contig sequences
[2018-02-23 02:12:56] DEBUG: Writing FASTA
-----------End assembly log------------
[2018-02-23 02:13:40] root: INFO: Running Minimap2
[2018-02-23 02:13:40] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-23 03:17:44] root: DEBUG: Sorting alignment file
[2018-02-23 04:01:37] root: INFO: Computing consensus
[2018-02-25 12:13:37] root: DEBUG: Genome size: 209715200
[2018-02-25 12:13:37] root: DEBUG: Chosen k-mer size: 17
[2018-02-25 12:13:37] root: INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume
[2018-02-25 12:13:37] root: INFO: Resuming previous run
[2018-02-25 12:13:37] root: INFO: Running Minimap2
[2018-02-25 12:13:37] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-25 14:28:55] root: DEBUG: Sorting alignment file
[2018-02-25 16:13:47] root: INFO: Computing consensus

The assembly was running well and produced the a draft sequence, Then we had a cluster crash during the consensus step (not specifically related to Flye I think). I restarted using "--resume ". During the consensus run I have received a large number of error like the two instances shown below. Flye appears to be still running.

[2018-02-25 12:13:37] INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] INFO: Resuming previous run
[2018-02-25 12:13:37] INFO: Running Minimap2
[2018-02-25 16:13:47] INFO: Computing consensus
Process Process-1020:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
    error_queue.put(e)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
EOFError
Process Process-1023:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
    error_queue.put(e)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message

When investigate the node processes, there appear to be a large number of flye processes that are not using any resources. I launched with 20 threads.

Tasks: 1211 total,   2 running, 588 sleeping,   0 stopped, 621 zombie
%Cpu(s): 37.4 us,  0.3 sy,  0.0 ni, 62.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  79251590+total, 29567382+used, 49684204+free,    75548 buffers
KiB Swap:  7842748 total,        0 used,  7842748 free. 50032572 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
22028 jon       20   0   25896   4048   2560 R   1.3  0.0   0:01.36 top
 3571 jon       20   0   20836   5868   2752 S   0.0  0.0   0:00.07 bash
 3580 jon       20   0  374792 341216   6100 S   0.0  0.0   3:44.47 flye
 8248 jon       20   0 32.335g 774880   4028 S   0.0  0.1   0:07.17 flye
15421 jon       20   0  374568 338464   3552 S   0.0  0.0   0:00.00 flye
15424 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15427 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15430 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15433 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15436 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15439 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15442 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15445 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15448 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15451 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15454 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15457 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15460 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15463 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15466 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15469 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15472 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15475 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15478 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15481 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15484 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15487 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15490 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15493 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye

Any ideas on what these errors might mean or if they are benign?

Cheers Jon

mikolmogorov commented 6 years ago

Hi,

It looks like there might have been some issues with process communication. In the version 2.3.1 we fixed a multiprocessing bug in the consensus module, which might be affecting your case. Could you update to 2.3.1 / 2.3.2 and see if the problem persists?

Best, Mikhail

jonhultqvist commented 6 years ago

I killed the job and restarted it again. This time Flye threw the errors but still kept going through consensus and finished without any further errors in later assembly stages. This was without the update to Flye 2.3.2.

I will update to the latest release anyway and if the problem resurfaces I will report back.

Thank you!