janelia-flyem / dvid

Distributed, Versioned, Image-oriented Dataservice
http://dvid.io
Other
196 stars 33 forks source link

/mappings endpoint does not filter out non-existent supervoxels. #275

Closed stuarteberg closed 6 years ago

stuarteberg commented 6 years ago

As expected, the /mapping endpoint returns 0 for supervoxels that no longer exist (due to splits):

In [110]: r = requests.get('http://emdata3:8900/api/node/2053/segmentation/mapping', json=[5813030294])

In [111]: r.json()
Out[111]: [0]

So far, so good. The /mappings (plural) endpoint allows me to download the entire in-memory map for all supervoxels in the instance. It returns a large CSV file, and I would expect that CSV file not to include such non-existent supervoxels, or, if they are included, map them to 0.

However, CSV file includes the above non-existent supervoxel, and it lists a non-zero mapping:

$ curl -s http://emdata3:8900/api/node/2053/segmentation/mappings | fgrep 5813030294
5813030294 1467849770

The split history of this supervoxel is shown below. Multiple supervoxels which no longer exist are present in the /mappings results. They are properly mapped to 0 by the /mapping endpoint, though.

1467849770 (<unknown>)
 +-- 5813030289 (8e72)
 +-- 5813030290 (8e72)   #  <--------- present in .../mappings
     +-- 5813030291 (8e72)
     |   +-- 5813043325 (275d)
     |   +-- 5813043326 (275d)
     |       +-- 5813043383 (275d)
     |       +-- 5813043384 (275d)
     |           +-- 5813043477 (275d)
     |           +-- 5813043478 (275d)
     |               +-- 5813043501 (275d)
     |               +-- 5813043502 (275d)
     +-- 5813030292 (8e72)    #  <--------- present in .../mappings
         +-- 5813030293 (8e72)
         +-- 5813030294 (8e72)     #  <--------- present in .../mappings
             +-- 5813030295 (8e72)
             +-- 5813030296 (8e72)
DocSavage commented 6 years ago

The /mapping implementation will load and check the label indices to verify the label actually exists. This would be very costly to do for /mappings since every supervoxel is returned so presumably it'll cause a full scan of all label indices. I should be able to remove known splits from /mappings and the error above is probably due to split histories not being applied when mappings are loaded, but I think there are still some cases why we need label index checks, right?

DocSavage commented 6 years ago

After looking at the old issue resolution, I think we are ok once I fix the /mappings to incorporate splits. The /mapping endpoint must query label indices because the client could provide non-existent supervoxels. The /mappings (plural) endpoint, however, is only returning supervoxels within the in-memory map so it should never have non-existent supervoxels, unless there was corruption.

DocSavage commented 6 years ago

curl -s http://emdata3:8900/api/node/2053/segmentation/mappings | fgrep 5813030294 still returns a non-zero mapping.

DocSavage commented 6 years ago

curl -s http://emdata3:8900/api/node/2053/segmentation/mappings | fgrep 5813030294 still returns a non-zero mapping.

DocSavage commented 6 years ago

Hmmm. In looking at the mutations actually being loaded, the only reference to supervoxel 5813030294 was a merge of it to 1467849770 in version id 11. http http://emdata3:8900/api/node/2053/segmentation/supervoxel-splits | grep 5813030294 shows no match, so that supervoxel is not in supervoxel-splits record. Where did you get the supervoxel split record cited above?

DocSavage commented 6 years ago

I wonder if this is an issue with incomplete logging of supermodel splits in early versions of labelmap. If so, the ‘/mappings’ result would be fixed by appending those split records to the log files for each version.

stuarteberg commented 6 years ago

Where did you get the supervoxel split record cited above?

Kafka logs.

I wonder if this is an issue with incomplete logging of supervoxel splits in early versions of labelmap.

Looks that way. That supervoxel is not returned by the /split-supervoxels endpoint. As we've discussed previously, someday we should retroactively apply split provenance info to our production server mutation log. But since I'm able to get that from kafka for now, this is not a major priority for me. Now that I'm aware of this issue with /mappings, I can work around it easily enough.

Since this issue is now fixed for "clean" dvid servers, I guess it can be closed now.