.klustakwik2 folder problem when more than one kwik file in same folder

shabnamkadir commented 8 years ago

21:22:41 [I] launch:214 Spike detection done! 21:22:42 [I] launch:239 Starting clustering on shank 0/1. 21:22:42 [I] launch:242 Clustering group 0 (354282 spikes). 21:22:45 [W] launch:112 Unable to resume KK from /scratch/scratch/smgxsk1/kilonips/20150601/.klustakwik2/spike_clusters.txt, there are 804147 values instead of 354282 21:22:45 [I] launch:122 Starting KK...

There is only one .klustakwik2 folder produced. This is not produced inside a subfolder that is specific to the running job. If klusta is attempted on two files in a single directory - the last clustering is therefore lost.

nippoo commented 8 years ago

This is a feature not a bug, but you're right that it probably needs better documenting somewhere...

shabnamkadir commented 8 years ago

What was the reasoning behind implementing it this way?! I can't see the benefits. I believe it has just caused havoc on the servers and on Legion, e.g. if if two jobs try to write to .klustakwik2/spike_clusters.txt at once for example.

I now have to run everything again!

nippoo commented 8 years ago

I'm not entirely sure I'm afraid...

shabnamkadir commented 8 years ago

I mean, if you have several .dat files in a single directory and then you try to cluster all of them at once - you end up in an utterly disastrous situation. Every job will try to access and write to .klustakwik2/spike_clusters.txt

and they will all get confused!

shabnamkadir commented 8 years ago

The way around it for now, if for the user to put every .dat file in its own directory.

kdharris101 commented 8 years ago

I think we made this decision when thinking about the new file format (i.e. moving away from HDF5).

We considered two options:

 Each experiment is identified by a file root, e.g. experiment1.dat – and it then generates files such as experiment1.spike_times.npy, experiment1.spike_clusters.npy, etc. This is the same philosophy as the Csicsvari format

 Each experiment lives in its own directory, and all files have the same name.

I remember a lot of discussion back and forth, and although I originally favored (1) out of familiarity, we settled on (2). I remember there being some quite convincing arguments, but forget what they were now!

We could try to dig them up in old emails.

ATB k

From: Shabnam Kadir [mailto:notifications@github.com] Sent: 18 May 2016 15:03 To: kwikteam/klusta klusta@noreply.github.com Subject: Re: [kwikteam/klusta] .klustakwik2 folder problem when more than one kwik file in same folder (#23)

The way around it for now, if for the user to put every .dat file in its own directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/kwikteam/klusta/issues/23#issuecomment-220035722

shabnamkadir commented 8 years ago

Thanks. I'd be curious to see the reasoning. I can't think of a single argument in favour of (2) right now! Every output file is still labelled by the experiment name, so this is in effect implementing (1). It's a shame that the .klustakwik files are inconsistent in this way. I think we have a bug, because at the moment we have a halfway house between (1) and (2).

I think there should be a big warning somewhere prominent in the docs. The unsuspecting user may have all their .dat files in one directory and not think to keep them separately. Also, I doubt most users will even check folder beginning with a '.' - they are often hidden.

rossant commented 8 years ago

@shabnamkadir you are right that this is not ideal. We started from (1) and we moved half-way to (2). The new format used by the template matching algorithms is pure (2). Since we'll be moving toward (2) eventually I suggest we get our users used to have 1 experiment per folder. This could be made clearer in the docs -- PR welcome...

kwikteam / klusta

.klustakwik2 folder problem when more than one kwik file in same folder #23