mhhennig / HS2

Software for high density electrophysiology
GNU General Public License v3.0
26 stars 17 forks source link

Parameter equivalence between HSDetection and Lightning #80

Open b-grimaud opened 1 month ago

b-grimaud commented 1 month ago

I noticed a few things while comparing the newer detection method with the older one :

Capture d’écran du 2024-07-25 19-31-37

Running on a Xeon Silver 4210R with 40 cores, Ubuntu 24.04. Chunk size on auto and set at 4884.

Sorry for the mixed bag of questions !

mhhennig commented 1 month ago

Thanks for raising this!

Running both with the same parameters on the same data, pre-processed with SI and with built-in rescale, lowpass and CMR disabled, yields a lot more spikes (presumably false positives) with Lightning. This is easily fixed by increasing the threshold, but is it expected ?

The new code deviates from the original in several ways. Main factors:

So the results will be different. You can compare the two versions here: https://gist.github.com/mhhennig/4c391b2e2c8b338d573fc951b1950b5d

Measured amplitudes of spikes are about an order of magnitude larger with Lightning, could this just be because of the different pre-processing or are they calculated differently ?

This is purely a scaling issue (quantisation).

Rescaling seems to have a pretty dramatic effect on performance, is there a reason why the default is set to -1280 ?

There was an additional factor of 64 in the original code. This number is the old default (20*64).

It also looks like the detection is still running on a single core on my machine, couldn't find any n_jobs argument for that module.

Parts of the code that are parallel are scaling, lowpass and common referencing. The main detection code is vectorised so benefits from parallel execution in a single core. Distributing this across cores has diminishing returns as there's just too much data transfer from/to memory. So you should see the occasional use of multiple cores, but not throughout. I'll check this though, maybe the compiler does not deal with it properly in your case.

This gives you an idea of the expected performance benefits: image

Hope this makes sense, and I'll get back about the multi-threading soon.

b-grimaud commented 1 month ago

Thanks a lot for the details !

There was an additional factor of 64 in the original code. This number is the old default (20*64).

From that and the comparison you provided, I assume this was initially handeld by this section ?

recf = si.normalize_by_quantile(
                recording=rec, scale=20., median=0.0, q1=0.025, q2=0.975
            )

Regarding the preprocessing, from what I could gather from Detection.cpp, the built-in lowpass is basically a rolling average of two consecutive frames ?

So you should see the occasional use of multiple cores, but not throughout.

Makes sense !

Distributing this across cores has diminishing returns as there's just too much data transfer from/to memory.

Does the same logic apply to chunk size ? Or shoud I just specify it as however much fits in memory ?

Regarding some of the other parameters, if I understand correctly :

Thanks again !