medvedevgroup / TwoPaCo

A fast constructor of the compressed de Bruijn graph from many genomes
Other
39 stars 10 forks source link

Is it stuck when there is really low CPU usage? #26

Open rickbeeloo opened 3 years ago

rickbeeloo commented 3 years ago

We just run twopaco for thousands of bacterial genomes and now it's at:

Round 0, 0:4398046511104
Pass    Filling Filtering

However, when we look at top we see that while its loaded in memory there is only 0.3% cpu usage: 514.3g 512.2g 4368 D 0.3 33.9 209:34.18 twopaco Is this normal or does this mean something is going wrong?

iminkin commented 3 years ago

Hey @rickbeeloo , did it finish? Do you have enough free space on the disk?

rickbeeloo commented 3 years ago

@iminkin not yet, but I think that makes sense with thousands of genomes, however, an odd thing is that we have the:

/path/to/file/1.fna
/path/to/file/2.fna
/path/to/file/3.fna
/path/to/file/4.fna
...
--------------------------------------------------------------------------------

but sometimes there is the following below it:

Round 0, 0:4398046511104
Pass    Filling Filtering

however, when I check later again this disappears and I see only the paths again. Nevertheless, now we see something happening:

512.0g 211.2g 4324 R 99.7 14.0 13:27.84 twopaco

It's hard to see how far it is and if it's not repeating

iminkin commented 3 years ago

How much memory does your machine have and what was the command line you used to run twopaco? I would suggest using the largest filter size possible given your machine's RAM (see documentation here: https://github.com/medvedevgroup/TwoPaCo#twopaco-usage).