Closed magland closed 2 years ago
Regarding (1), I restored the functionality where it pre-computes extract-snippets-h5 and blocks the GUI. This requires backend >=0.2.20. I deployed the front-end to this staging URL so as not to break existing deploy: https://sortingview-magland-spikeforest.vercel.app
But as soon as you have the 0.2.20 backend up, I will move this to prod.
@jsoules, fyi, to deploy to staging we run vercel
. To deploy to prod we run vercel --prod
. Of course, a local dev deploy is vercel dev
Regarding (2), this is the code you'll want to use prior to opening gui (untested):
from sortingview.tasks.preload_extract_snippets import task_preload_extract_snippets
recording = # the labbox ephys recording extractor
sorting = # the labbox ephys sorting extractor
workspace = # the sortingview workspace
snippet_len = workspace.snippet_len # the snippet len associated with the workspace
# This should pre-compute the extract snippets so it won't block the sortingview GUI
snippets_h5_uri = task_preload_extract_snippets(
recording_object=recording.object(),
sorting_object=sorting.object(),
snippet_len=snippet_len
)
@magland I think precomputing makes a great deal of sense, and we could do that pre-computation when we create the workspace, which would be fine. My only concern is disk space / cleanup, as we're close to running out of space at the moment, so ... is right to think that the snippets are also stored in the recording file (currently NWB; hopefully your higher efficiency version once we get back to trying it)?
Also, where would snippets_h5_uri be used?
@magland We have updated sortingview
and are running backend version 0.2.20 now.
@lfrank.
Also, where would snippets_h5_uri be used?
Doesn't need to be used anywhere. That's just for info. The important thing is that the task was precomputed.
Regarding disk space, I have some opinions and plans about it - perhaps we can discuss in a call.
@magland @lfrank Just regenerated a workspace and tried opening it with the new sortingview backend. It precomputed the snippets before activating the GUI. This was pretty fast; took < 3 min for a 4-channel, 90 min recording. The visualization widgets were reasonably fast as well - timeseries view was probably the slowest. Check out the workspace here.
Actually there still seems to be some instabilities - the backend crashed with the following message when I tried to open up a lot of widgets at once
Traceback (most recent call last):
File "/home/kacheryuser/miniconda3/envs/kachery-env/bin/sortingview-start-backend", line 6, in <module>
sortingview.start_backend_cli()
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/click/core.py", line 1137, in
__call__
return self.main(*args, **kwargs)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/click/core.py", line 1062, in
main
rv = self.invoke(ctx)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/click/core.py", line 1404, in
invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/click/core.py", line 763, in i
nvoke
return __callback(*args, **kwargs)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/sortingview/backend/start_back
end_cli.py", line 8, in start_backend_cli
start_backend(channel=channel)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/sortingview/backend/start_back
end.py", line 8, in start_backend
kc.run_task_backend(
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/kachery_client/task_backend/ru
n_task_backend.py", line 32, in run_task_backend
B.process_events()
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/kachery_client/task_backend/Ta
skBackend.py", line 47, in process_events
self._task_job_manager.process_events()
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/kachery_client/task_backend/Ta
skJobManager.py", line 50, in process_events
requested_task.update_status(status=job.status, error_message=error_message, result=result)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/kachery_client/task_backend/Re
questedTask.py", line 35, in update_status
_update_task_status(channel=self.registered_task_function.channel, task_id=self.task_id, task_function_id=se
lf._registered_task_function.task_function_id, task_hash=self.task_hash, task_function_type=self.task_function_t
ype, status=status, result=result, error_message=error_message)
File "/home/kacheryuser/miniconda3/envs/kachery-env/lib/python3.8/site-packages/kachery_client/task_backend/_u
pdate_task_status.py", line 39, in _update_task_status
raise Exception(f'Unable to update task status')
Exception: Unable to update task status
Cleaning up parallel job handler
Cleaning up parallel job handler
Cleaning up parallel job handler
Cleaning up parallel job handler
Cleaning up parallel job handler
Cleaning up parallel job handler
Cleaning up parallel job handler
I believe that crashing issue reported by Kyu has now been addressed.
But I'm keeping this issue open because there is still more work to do with regards to precomputing snippets h5.
There are a few ways in which the crucial extract snippets processing step needs improvement:
Extract snippets is a potentially-time-consuming preprocessing step for many of the sortingview tasks (waveforms, clusters, metrics, spike amplitudes). Many of these are launched in parallel. What we don't want is to launch the same extract-snippets step in parallel. In a previous version of labbox-ephys, a singleton extract snippets step was run first on opening of sortingview (and the GUI was grayed out) -- but that was removed as part of a simplifying/refactoring effort. I think we'll need to reimplement that, but perhaps in a smarter way. After all, some processing does not require this, and it would be a shame to block the GUI for those operations if extract-snippets is time consuming.
Ideally, we should pre-compute the extract-snippets step prior to opening the GUI as part of the recording/sorting import stage.
Some tasks require all events (e.g., spike amplitudes), while others should use a random subsampling of events (e.g., avg waveforms). For the former we could get away with very short snippets to save compute time/disk space. I think we'll want to extract at least two snippets .h5 file, one with shorter snippets and all events, and the second with longer snippets and a subsampling of events. Perhaps a third as well, longer snippets and all events. Of course we'll want to figure out reasonable parameters for subsampling and short/long snippet lengths.