janelia-flyem / dvid

Distributed, Versioned, Image-oriented Dataservice
http://dvid.io
Other
196 stars 33 forks source link

Cases of split supervoxels splits which were not logged to kafka #288

Open stuarteberg opened 5 years ago

stuarteberg commented 5 years ago

Here is an example of a supervoxel (1000417481) which was split a long time ago, back in node 017ad, but is not mentioned in the kafka log. Do we know exactly when kafka logging was first enabled?

At the very least, this indicates that when our production server's internal mutation log is finally updated to include all split events, we cannot rely solely on the kafka log.

Analysis

(The following assumes you've installed neucleasevia conda install -c flyem-forge neuclease)

First, does the kafka log mention this supervoxel?

In [1]: from neuclease.dvid import *

In [2]: from neuclease import configure_default_logging

In [3]: configure_default_logging()

In [4]: splits_df = fetch_supervoxel_splits('emdata3:8900', '9e0d', 'segmentation', source='kafka', format='pandas')
INFO [2018-09-10 16:54:49,440] Reading kafka messages from ['kafka.int.janelia.org:9092', 'kafka2.int.janelia.org:9092', 'kafka3.int.janelia.org:9092'] for emdata3:8900 / 9e0d / segmentation
INFO [2018-09-10 16:55:40,647] Reading 912094 kafka messages took 51.20722818374634 seconds

In [5]: splits_df.head()
Out[5]:
                               uuid  mutid         old      remain       split              type
0  017ade09fc214382acba582a97db9164      1  1329532718  5812984423  5812984422  split-supervoxel
1  017ade09fc214382acba582a97db9164      2  1341870875  5812984425  5812984424  split-supervoxel
2  017ade09fc214382acba582a97db9164      3  1345855821  5812984427  5812984426  split-supervoxel
3  017ade09fc214382acba582a97db9164      4  1351234270  5812984429  5812984428  split-supervoxel
4  017ade09fc214382acba582a97db9164      5  1354640265  5812984431  5812984430  split-supervoxel

In [6]: 1000417481 in splits_df['old'].values
Out[34]: False

OK, it's not in the kafka log. In which UUID did it disappear?

In [37]: for uuid in nx.topological_sort(dag):
    ...:     size = fetch_sizes('emdata3:8900', uuid, 'segmentation', [1000417481], supervoxels=True).values[0]
    ...:     print(uuid, size)
    ...:
a776af0b132f44c3a428fe7607ba0da0 6268720
ac90185a83a44809a2c8f0251c121827 6268720
417bd9a68ed94b9381c4f35aa8a0d7f6 6268720
4ea3bd5698204089b82408c178fe1d55 6268720
b4ef00a68ef84e72b3b821a815500b3c 6268720
017ade09fc214382acba582a97db9164 0
6134ca01a0cf444baf82d5bc1efb49e8 0
039784741703407ea25c9acdc6d0db8c 0
f93b3f65842f45f6880a35a869df1837 0
8e727cad0d4f4f31a64d525c3ecea0a3 0
9ec0b3f28eca490294d60990a4841986 0
367efccd08e44dea9397fb5fef306eaa 0
05ef86adde6341658bc6d53f4f5c03db 0
55ce82b0567b4987960652a169f9b7ff 0
25dcdb44cd934376999d93f1aa4d4b5f 0
275df4022f674852a15bf88514747ead 0
6ad4f7aba05f437f8d8ed91d78184cdd 0
4d26485049c64cd580d5db1b46a1f74e 0
5421620d45b3413c9936741d9a8d4845 0
52f9470433c84b8fa2ea9b69458ad793 0
662edcb44e69481ea529d89904b5ef9b 0
2053c1a64f254961874d91407e7301e3 0
f5451c4cdba840e08d86f6b3a0cb378d 0
a5f0b441a4024b90962cfed5047439af 0
7e526ec1e5cd4cbf80a8ff2f01e61c68 0
86b88eb42f2e45a180088b9738dae6a0 0
d5852c27b5c04687bb1be414f6dc2336 0
321009656dde467a807957462d227bf1 0
299e6b5c2f0b4db69f405691338395a5 0
f2149cd06e0f4f59b6876daceb2c55f8 0
7254f5a8aacf4e6f804dcbddfdac4f7f 0
cc4c9efb77e848779e6f0c0011ba7a41 0
9e0d2e899d624d47a53602f3ae986633 0
07160ccb9ee849ad8465c3b617bb90e5 0

OK, it was apparently split in node 017a.

According to Bill, if you sift through the http logs for emdata3:8900, you can see that it was indeed split in that node, apparently successfully. So why isn't it in the kafka log?

Other examples

The above situation also applies to the following list of supervoxels (143 of them, including the example above):

[1000417481, 1000771652, 1001799124, 1002157283, 1004704982, 1010864660, 1011607138, 1012215884, 1015794532, 1031448152, 1032134610, 1033481351, 1034729226, 1035467390, 1040320235, 1041615233, 1041714586, 1043842324, 1044321264, 1045543423, 1064374109, 1066044707, 1066934333, 1073991970, 1074997905, 1078192640, 1080248601, 1080865083, 1094191611, 1094511026, 1094523912, 1094588645, 1097139770, 1107333104, 1108317366, 1110583110, 1124544143, 1124591845, 1125770676, 1126607999, 1126862492, 1127497026, 1128931504, 1129668438, 1130074317, 1133286143, 1134667640, 1141972137, 1143457103, 1155648327, 1155898476, 1157832457, 1166380618, 1168422108, 1169350633, 1169704364, 1173854086, 1174465847, 1174496225, 1192934286, 1195442424, 1196046824, 1196914400, 1201296764, 1201551064, 1202855856, 1203196299, 1203463471, 1204499798, 1205060825, 1205190945, 1217031534, 1217372881, 1217674791, 1218046230, 1223248550, 1227884417, 1228109007, 1228437070, 1232309681, 1233155908, 1233492629, 1235180325, 1235517136, 1235784729, 1236617883, 1249439568, 1257689534, 1260534069, 1265874582, 1268520535, 1268848711, 1279744734, 1280478489, 1281955027, 1284730962, 1299887872, 1310766647, 1311168375, 1313038028, 1315701772, 1315818283, 1321330303, 1321671303, 1322094535, 1326618865, 1326894969, 1327093528, 1327201447, 1327611618, 1329325891, 5812984087, 5812984127, 5812984129, 5812984139, 5812984141, 5812984163, 5812984167, 5812984195, 5812984201, 5812984207, 5812984221, 5812984257, 5812984259, 5812984275, 5812984277, 5812984289, 5812984293, 5812984305, 5812984307, 5812984321, 5812984327, 5812984329, 5812984355, 5812984357, 5812984359, 5812984369, 5812984399, 5812984401, 5812984407, 5812984417, 5813044280, 5813050878]

Upon investigation ALL of them were also split in node 017ade09fc214382acba582a97db9164.

stuarteberg commented 4 years ago

This is really issue #316 and could be a queue issue.

I'm confused. This issue is about splits that were recorded in DVID, but not recorded in kafka. But issue #316 is the opposite -- when kafka records a split that was not recorded in DVID. Are those really the same issue?

DocSavage commented 4 years ago

You're right. Read the other too quickly. This issue is the queue issue and the other issue is probably something else like more-than-once delivery.