janelia-flyem / dvid

Distributed, Versioned, Image-oriented Dataservice
http://dvid.io
Other
197 stars 33 forks source link

Log Annotation Element Sync changes in Kafka. #308

Closed umayaml closed 5 years ago

umayaml commented 5 years ago

Annotation data type needs to log in Kafka which elements (synapses) have changed label sync due to some sort of mutation on the segmentation. The mutation ID that caused the elements to change sync needs to be also reported so that we can use the labelmap kafka log to determine the synapse label sync changes per mutation. If we can list the element coordinates that were affected by the mutation (mutation_id), I feel that the coordinate is sufficient enough information because all the synapses are already loaded into Neuprint. If the synapse coord list is exceeds the kafka_log character limit, we can split the affected elements into multiple kafka messages. If a message gets split we should probably assign each split message an ordering number.

In order to update Neurons in Neuprint for status and name, keyvalue datatype needs to post the value being kvpost in the Kafka logs. That way, we don't need to contact DVID to get the changes made to status and name for the Neurons. I think this should probably be an option when creating the keyvalue since it only really applies to keyvalues that store text (example segmentation_annotations). An alternative solution is to create a new type of keyvalue that only allows text and limits the size. This new keyvalue type will log the existing text, text posted and deleted to the keyvalue in kafka.

We also update the voxel size for each Neuron that is mutated. One way to relay this information in the Kafka logs is to post the updated Neuron sizes when you log that the mutation has bee completed. (Ex. split-complete, merge-complete, cleave-complete) for all bodies involved in the mutation.

The goal is to report the synapse changes in kafka so that when syncing with neuprint, we don’t need to contact DVID to get the data we need to update Kafka. We would like to avoid having to use a blob store to get this information if possible.

umayaml commented 5 years ago

Hi Bill, I think this spec is good to go. I would prioritize getting synapse sync changes due to mutations into DVID first. We probably spend most of process getting the current synapses on labels after mutations by using the annotation/label endpoint. Which we do one label at a time since there is no option to get multiple sets of synapses for a set of labels. I think if we get synapse sync changes (which labels they are now on) in the log, we can greatly reduce the number requests we have to make to DVID on a daily basis. I think second priority would be the improved logging of changes to keyvalues whenever text is posted. And third, would be logging the voxel sizes of the labels when mutations occur. Steve would like at least the synapse sync changes implemented before the new neuprint data model is finished.

DocSavage commented 5 years ago

Please create a separate issue for the voxel sizes of labels when mutations occur. I think that would be messages generated off a labelmap or labelarray datatype? That would not be labelsz, right?

For documentation on the JSON produced to the Kafka topics, please see the new wiki page on Kafka Messages.