opensanctions / yente

API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.
https://www.opensanctions.org/docs/yente/
MIT License
66 stars 26 forks source link

elasticsearch.helpers.BulkIndexError: 37 document(s) failed to index. #276

Closed jbothma closed 1 year ago

jbothma commented 1 year ago

Seen in logs after following docker-compose instrutions at https://www.opensanctions.org/docs/yente/deploy/

yente-tut-app-1  | 2023-06-27T10:16:35.283539Z [error    ] Indexing error: BulkIndexError('37 document(s) failed to index.') [yente.search.indexer] dataset=default entities_url=https://data.opensanctions.org/datasets/20230627/default/entities.ftm.json index=yente-entities-default-00320230627063008
yente-tut-app-1  | Traceback (most recent call last):
yente-tut-app-1  |   File "/app/yente/search/indexer.py", line 115, in index_entities
yente-tut-app-1  |     await async_bulk(es, docs, yield_ok=False, stats_only=True, chunk_size=1000)
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 337, in async_bulk
yente-tut-app-1  |     async for ok, item in async_streaming_bulk(
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 252, in async_streaming_bulk
yente-tut-app-1  |     async for data, (ok, info) in azip(  # type: ignore
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 156, in azip
yente-tut-app-1  |     yield tuple([await x.__anext__() for x in aiters])
yente-tut-app-1  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 156, in <listcomp>
yente-tut-app-1  |     yield tuple([await x.__anext__() for x in aiters])
yente-tut-app-1  |                  ^^^^^^^^^^^^^^^^^^^
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 127, in _process_bulk_chunk
yente-tut-app-1  |     for item in gen:
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 274, in _process_bulk_chunk_success
yente-tut-app-1  |     raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
yente-tut-app-1  | elasticsearch.helpers.BulkIndexError: 37 document(s) failed to index.

matching still works fine

two indexes were loading at the same time, then I think the first completed and the second failed?

more logs from just before

yente-tut-app-1  | 2023-06-27T10:16:28.123512Z [info     ] Index: 335000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:28.315509Z [info     ] Index: 2157000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:29.612188Z [info     ] Index: 2158000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:30.304867Z [info     ] Index: 336000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:31.307114Z [info     ] Index: 2159000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:32.216707Z [info     ] Index: 337000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:32.329842Z [info     ] Index is now aliased to: yente-entities [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:32.339328Z [info     ] Delete other index             [yente.search.indexer] index=yente-entities-default-00320230627063008
index            | {"@timestamp":"2023-06-27T10:16:32.341Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/S79Og3JaRmqF7Eh0Rzu-AQ] deleting index", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
yente-tut-app-1  | 2023-06-27T10:16:32.476391Z [info     ] Index update complete.         [yente.search.indexer] changed=True
yente-tut-app-1  | 2023-06-27T10:16:32.476725Z [info     ] Closing elasticsearch client   [yente.search.base] 
yente-tut-app-1  | 2023-06-27T10:16:34.133746Z [info     ] Index: 338000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
index            | {"@timestamp":"2023-06-27T10:16:34.184Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataCreateIndexService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.528Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] create_mapping", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.579Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.626Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.688Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.730Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.778Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.818Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.863Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.903Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.069Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.140Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.184Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
yente-tut-app-1  | 2023-06-27T10:16:35.283539Z [error    ] Indexing error: BulkIndexError('37 document(s) failed to index.') [yente.search.indexer] dataset=default entities_url=https://data.opensanctions.org/datasets/20230627/default/entities.ftm.json index=yente-entities-default-00320230627063008
yente-tut-app-1  | Traceback (most recent call last):
yente-tut-app-1  |   File "/app/yente/search/indexer.py", line 115, in index_entities
yente-tut-app-1  |     await async_bulk(es, docs, yield_ok=False, stats_only=True, chunk_size=1000)
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 337, in async_bulk
yente-tut-app-1  |     async for ok, item in async_streaming_bulk(
jbothma commented 1 year ago

will keep it running and see if the next update indexes or also has an issue

jbothma commented 1 year ago

it later tried again and successfully indexed the new dataset.

pudo commented 1 year ago

Nice catch on this :)