kwikteam / klustakwik2

Fast software for high-dimensional cluster analysis using the masked EM algorithm for Gaussians mixtures
BSD 3-Clause "New" or "Revised" License
27 stars 13 forks source link

temp.clu and graceful close #53

Closed nsteinme closed 9 years ago

nsteinme commented 9 years ago

Two issues related to breaking before strictly finishing.

There needs to be a provision to store a temporary output of the current state of the clustering i.e. the temp.clu. When integrated into phy, perhaps this could just write the clustering to the kwik file on each iteration.

And/or, some way to stop execution while saving out the results.

nippoo commented 9 years ago

On cleanup after an exit (SIGKILL etc -> ctrl+C), it should save the .temp.clu, either return it or save to a file automatically.

nippoo commented 9 years ago

Here's a related suggestion:

This would make it automatically much more robust to the process being killed and restarted.

nsteinme commented 9 years ago

Yes, that seems sensible to me.

nsteinme commented 9 years ago

Maybe also make a quick backup copy of an existing clu if you are starting from it and plan to overwrite.

rossant commented 9 years ago

+1 to that

nsteinme commented 9 years ago

But is the concept of a clu file even going to stay around? When integrated with phy/kwik files, it should pick up from whatever is the "main" clustering in the file unless told to start from "original" and move the old main to a backup, yes? And save to kwik every iteration?

thesamovar commented 9 years ago

In the kk2_legacy script you have the option save_clu_every=n which saves the .clu file every n minutes, and you also have start_from_clu=filename. This is also in the ipython notebook (although maybe it could do with being made clearer).

So I would say close this issue and open a similar one in phy?

nsteinme commented 9 years ago

Aha, I missed that because it isn't listed in the "klustakwik.initial_parameters" report within a klg file. But there it is on the main github page. Sorry about that!

nippoo commented 9 years ago

Nick had two requests though:

1) save on every iteration / every few iterations (that's the .temp.clu, which we need to figure out in phy) 2) save cleanly upon close. I think this is still relatively important: ideally you'd want to autosave to protect against hardware failure or whatever, but being able to just stop the process when you're happy with the clustering state would be good. 1) is implemented already in kk2, but is 2)? I guess you just need to save the .clu in the destructor (on __del__())?

thesamovar commented 9 years ago

I don't think it matters much the difference between 1 and 2. If you are saving every 5 minutes you lose at most 5 minutes worth of computation time.

nsteinme commented 9 years ago

I agree with Dan, the current implementation with saving every so many minutes is quite sufficient; sorry I forgot it and started this whole thing!

On Thu, Jun 4, 2015 at 4:34 PM, Dan Goodman notifications@github.com wrote:

I don't think it matters much the difference between 1 and 2. If you are saving every 5 minutes you lose at most 5 minutes worth of computation time.

— Reply to this email directly or view it on GitHub https://github.com/kwikteam/klustakwik2/issues/53#issuecomment-108937016 .