janelia-flyem / dvid

Distributed, Versioned, Image-oriented Dataservice
http://dvid.io
Other
197 stars 33 forks source link

Endpoint to reload annotations for only a subset of labels #359

Open stuarteberg opened 2 years ago

stuarteberg commented 2 years ago

[Not a high priority, just writing this down.]

It's often the case that we start working on a segmentation (e.g. agglomerating it.) before synapses are ready for ingestion. Once synapses are ready, we ingest them into the root UUID (via DVID_ADMIN_TOKEN). The normal procedure for ingesting synapses is as follows:

  1. Create the annotation instance (typically named synapses)
  2. Load all synapse annotations (via POST .../blocks)
  3. THEN POST .../sync to the segmentation instance.
    • (If you do this out-of-order (before step 2), then the overhead of keeping the synapse denormalizations in-sync during annotation ingestion kills the ingestion performance.)
  4. POST .../reload for the synapses instance.

Step 4 needs to be repeated for every UUID in the DAG (or at least every UUID for which we want synapse denormalizations to be available).

The reload process takes a long time (e.g. 5 hours for our current CNS half-brain). Denormalizations for every synapse-containing segment need to be generated. But in a typical DAG, most denormalizations shouldn't be different from one UUID to the next. Aside from the root UUID, it would be more efficient to simply scan the mutations log to see which labels have changed since the last UUID, and only reload those denormalizations. If the POST .../reload command accepted a list of body IDs to update, the process could be made much faster.

Side note: It would also be great if an endpoint existed to poll the reload status of a given UUID. (Is it currently reloading?)