ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
177 stars 63 forks source link

Jobs repeatedly killed #84

Closed JFWolters closed 5 years ago

JFWolters commented 5 years ago

Hi,

I'm trying to run NOVOplasty 3.1 to assemble and circularize some yeast mitochondrial assemblies using contigs I've identified as mitochondrial from other assemblies as seeds. In many cases these seeds already represent the complete or nearly complete genome.

Most cases run with no issue but specific assemblies keep resulting in the job being randomly killed for no apparent reason. Even in the extended log I don't see a clear reason why (logs and seed attached, let me know if more would be helpful).

It may be an issue with my system but I was curious if the log holds any insights that are not apparent to me. Candida_tammaniensis.zip

ndierckx commented 5 years ago

Hi, Could you also send me the normal log file? If i get's killed is usually a memory problem...

And could try also with the latest version (3.1)? Need the new config file for that one

JFWolters commented 5 years ago

Hello,

This was run using version 3.1.

I thought it might be a memory problem but I'm not getting the error messages I see others reporting when memory is the issue. Other runs that should have had larger memory requirements run in parallel had no issues.

Every run the assembly appears to be progressing smoothly, reaches an assembly size close to what other assemblers have reached as the likely genome size, and then hangs, seemingly making no progress, for a very long time until a "Killed" message appears.

I have attached the files with the STDOUT in run_log.txt. For some reason the standard log file wasn't being generated. The only output missing is the "Killed" message.

Any assistance would be greatly appreciated.

Candida_tammaniensis.zip

ndierckx commented 5 years ago

Hi,

Must be a bug I missed. Is there a way to send me the data so I can run it myself?

JFWolters commented 5 years ago

Thank you for the super fast reply.

The accession for the sequencing reads I am using for this assembly in SRA is SRR6476025. The assembly used as the seed is in the prior attached data.

I have been using the original data prior to SRA upload so I will rerun with the SRA data just to confirm its all behaving the same.

JFWolters commented 5 years ago

Hello,

It looks like this is all a mistake on my part. I was using a script to generate the config files and it appears I was overestimating the read length by 1 (likely due to accidnetally counting the newline character). After correcting the read length parameter NOVOplasty runs without error. Apologies for the confusion.

Edit: I am unclear why this only caused the assembly to fail in 3 cases out of 161. Regardless I am reunning all assemblies with the correct read length.

ndierckx commented 5 years ago

Hi,

It's not your mistake, 1bp wrong shouldn't influence the assembly that much and a wrong read length can't freeze the assembly. I guess by changing something small you avoided the bug, but it's still there so will run with your settings to find it. There is a batch option if you are interested (explanation in the wiki)

ndierckx commented 5 years ago

I fixed the problem in version 3.2, there was some code that slurped a lot of memory when there were a lot of ambiguous nucleotides in the assembly. It should be fixed now, at least with the dataset you send, the problem is solved. Thanks for reporting