hubmapconsortium / search-api

HuBMAP search service and associated pieces to create an index
https://search.api.hubmapconsortium.org
MIT License
2 stars 2 forks source link

Investigate reindex 409 errors on PROD #834

Closed yuanzhou closed 1 month ago

yuanzhou commented 1 month ago

Example:

ERROR:hubmap_translator:Unable to directly update elements of document with related_entity_target_elements=['immediate_descendants', 'descendants', 'immediate_ancestors', 'ancestors', 'source_samples', 'origin_samples', 'datasets'], related_entity_id=3280ceaaee2c24e262d46a5f83edfe98. Got status_code=409 at es_url=https://search-hubmap-prod-e6jfsykycntlyd7m5khviqfbpi.us-east-1.es.amazonaws.com, endoint '_update/3280ceaaee2c24e262d46a5f83edfe98' with qdsl_update_payload_string={"script": {  "lang": "painless",  "source": "for (prop in ['immediate_descendants', 'descendants', 'immediate_ancestors', 'ancestors', 'source_samples', 'origin_samples', 'datasets']) {if (ctx._source.containsKey(prop))  {for (int i = 0; i < ctx._source[prop].length; ++i)   {if (ctx._source[prop][i]['uuid'] == params.modified_entity_uuid)    {ctx._source[prop][i] = params.revised_related_entity} } } }",  "params": {   "modified_entity_uuid": "564167adbbb2fdd64c24e7ea409c23f1",   "revised_related_entity": {"contains_human_genetic_sequences": false, "created_by_user_displayname": "HuBMAP Process", "created_by_user_email": "hubmap@hubmapconsortium.org", "created_timestamp": 1720814265702, "creation_action": "Create Dataset Activity", "data_access_level": "consortium", "dataset_type": "Histology", "description": "H&E slides corresponding to CODEX datasets : ./B004_SB-reg002", "entity_type": "Dataset", "files": [], "group_name": "Stanford TMC", "group_uuid": "def5fd76-ed43-11e8-b56a-0e8017bdda58", "hubmap_id": "HBM458.SXBD.528", "last_modified_timestamp": 1720814409827, "provider_info": "H&E for CODEX : ./B004_SB-reg002", "status": "Submitted", "title": "Histology data from the small intestine of a 78-year-old black or african american male", "uuid": "564167adbbb2fdd64c24e7ea409c23f1"}  } } }.

Where the request payload is

{
  "script": {
    "lang": "painless",
    "source": "for (prop in ['immediate_descendants', 'descendants', 'immediate_ancestors', 'ancestors', 'source_samples', 'origin_samples', 'datasets']) {if (ctx._source.containsKey(prop))  {for (int i = 0; i < ctx._source[prop].length; ++i)   {if (ctx._source[prop][i]['uuid'] == params.modified_entity_uuid)    {ctx._source[prop][i] = params.revised_related_entity} } } }",
    "params": {
      "modified_entity_uuid": "564167adbbb2fdd64c24e7ea409c23f1",
      "revised_related_entity": {
        "contains_human_genetic_sequences": false,
        "created_by_user_displayname": "HuBMAP Process",
        "created_by_user_email": "hubmap@hubmapconsortium.org",
        "created_timestamp": 1720814265702,
        "creation_action": "Create Dataset Activity",
        "data_access_level": "consortium",
        "dataset_type": "Histology",
        "description": "H&E slides corresponding to CODEX datasets : ./B004_SB-reg002",
        "entity_type": "Dataset",
        "files": [],
        "group_name": "Stanford TMC",
        "group_uuid": "def5fd76-ed43-11e8-b56a-0e8017bdda58",
        "hubmap_id": "HBM458.SXBD.528",
        "last_modified_timestamp": 1720814409827,
        "provider_info": "H&E for CODEX : ./B004_SB-reg002",
        "status": "Submitted",
        "title": "Histology data from the small intestine of a 78-year-old black or african american male",
        "uuid": "564167adbbb2fdd64c24e7ea409c23f1"
      }
    }
  }
}
yuanzhou commented 1 month ago

This issue continued on 7/16/2024. Maybe related or helpful to debugging.

yuanzhou commented 1 month ago

Additional errors

DEBUG:opensearch_helper_functions:Target url: https://search-hubmap-prod-e6jfsykycntlyd7m5khviqfbpi.us-east-1.es.amazonaws.com/hm_prod_consortium_entities/_update/dd943ec6c1ae5c6956889ba483969833
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.9/logging/__init__.py", line 1086, in emit
    stream.write(msg + self.terminator)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/lib64/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/src/app/src/./hubmap_translator.py", line 597, in translate
    self._directly_modify_related_entities( es_url=es_url
  File "/usr/src/app/src/./hubmap_translator.py", line 401, in _directly_modify_related_entities
    opensearch_response = execute_opensearch_query(query_against=f"_update/{related_entity_id}"
  File "/usr/src/app/src/search-adaptor/src/opensearch_helper_functions.py", line 138, in execute_opensearch_query
    logger.debug(json_data)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
DEBUG:urllib3.connectionpool:[https://search-hubmap-prod-e6jfsykycntlyd7m5khviqfbpi.us-east-1.es.amazonaws.com:443](https://search-hubmap-prod-e6jfsykycntlyd7m5khviqfbpi.us-east-1.es.amazonaws.com/) "POST /hm_prod_consortium_entities/_delete_by_query?q=uuid:38ae6abc06d8f34c59413c958ad89731 HTTP/1.1" 409 599
ERROR:libs.es_writer:Failed to delete doc of uuid: 38ae6abc06d8f34c59413c958ad89731 from index: hm_prod_consortium_entities
ERROR:libs.es_writer:Error Message: {"took":28875,"timed_out":false,"total":1,"deleted":0,"batches":1,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"hm_prod_consortium_entities","type":"_doc","id":"38ae6abc06d8f34c59413c958ad89731","cause":{"type":"version_conflict_engine_exception","reason":"[38ae6abc06d8f34c59413c958ad89731]: version conflict, required seqNo [33766], primary term [1]. but no document was found","index_uuid":"qBHQcRrrR_-75suqjXixvQ","shard":"3","index":"hm_prod_consortium_entities"},"status":409}]}
yuanzhou commented 1 month ago

The 82 datasets grouped by donor (7 different donors):

╒══════════════════════════════════╤══════════════════════════════════╕
│Dataset                           │Donor                             │
╞══════════════════════════════════╪══════════════════════════════════╡
│"1387eabd856a6e1e15a61f2be1579d79"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"dade713c7c80e504d6f04e008eccbb3d"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"4121438406ffd936bb32e4a39c5b9f08"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"28e97cc664d10c77826c3e6558cdc92a"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"7419c7f02fb8a6ae8a331ef999f30349"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"01d7163b90f5970d8b45ee7848034969"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"16590c8741e52c027e39831cf369f4db"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"b45d7317bc9877559fb9b8463250d01b"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"8868942bd2aac2b5259acc704081085f"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"00f9fd05621f0375bd55271fa8216152"│"e71689fb01e59f5f57cc3ec250ba9609"│
├──────────────────────────────────┼──────────────────────────────────┤
│"a88d8eb0b1135c28626b705094d3fa48"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"2473f80b3067febcaab60417d7e613aa"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"b8a02ec8f08a803afdb9c194326e2c2c"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"60baa60096c103ef8a5c07473dc6fab0"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"9ae75582fd6a0893f1f877d7e922929f"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"756a08666ef3593bdf427543b50c49ae"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"e14187fbc4f361f8041ac3444a8b99a4"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"1f329d5cddd9d58bdb40ec77f82a4c5e"│"e3e625f5f072d99a5d9e31927787f23d"│
├──────────────────────────────────┼──────────────────────────────────┤
│"a919ae8f0fef451d5717881237bb7acc"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"b1619ac3533bd1cf1c3c13835c756025"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"e2152d9d5cf0df79bb9259c494dfe488"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"322828d2189e7d1398023d0bbc323bbc"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"85b2008816fc118d24b4898a2f2f385e"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"2f3a308ac1abbbab7d2a39e78538b8bc"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"f7858cb6a2a132e410240b40374756cf"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"0e5db3e82d59f1a5f649721ceb1c2192"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"76a60a2d228f261233489a1e21809128"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"d37bd08627bdfd1b6dfb045742dd8dbf"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"70a335f1d85ce1430d30cab2dbc626db"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"756c1fa2b650326425158f3c7c4892de"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"660fc142f706d09279ac56f0a60199d7"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"7e9ce728a4ba0618a963a7f100c56781"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"468a4b19ecf29e84cb82450eff25cf91"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"a3f581212c9931481fc4dcb35410b522"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"d2bd55eacd8d8b39bdd1240d5dbc5241"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"08c89dfa627c81645262fd51c5b43af3"│"b5b4de30c90e4cb3def65407292942e9"│
├──────────────────────────────────┼──────────────────────────────────┤
│"f16b63fd8cc227270586174cdc87f20d"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"228e8f186d43c5b6977cfbd31404ce33"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"931d9aed81d8ad95ac5d96d99ff9510f"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"3d83380207aaa099ef77f6177532cad0"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"38ae6abc06d8f34c59413c958ad89731"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"04d66aa9eb6c563c02b6d60d523b2465"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"f83f05849079beb3cc961c16a7817258"│"8c3a4460e5da93daa87f5f834cafcb43"│
├──────────────────────────────────┼──────────────────────────────────┤
│"4ae9efc30fa79395ea56ba6be36f3353"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"96795442edff12a379aab0a7bcd7a0fe"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"01736bbe45f672af16894b24af8b5abd"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"2c4221684aba6e589f577e648ea97bde"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"2a156b9af5471735cc3b784a3bb965cd"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"b9d96541f25297e0b156713707bc93bf"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"10e778214f69647108a5b1617400979c"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"3e3017e112a73aa6559f679e2c6ce274"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"ea7d904c24285b3311e491f1d614b747"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"c3957fd44a5607ad9803820579a403b8"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"e1891c6304891165e8cf2b4480543a05"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"5d74feeffb7ab3fe26c3ec665b71dc5a"│"6d285926ae00557deb66d30b1bcf6649"│
├──────────────────────────────────┼──────────────────────────────────┤
│"58d62b7b28155f800e8679f2c7f0bd25"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"e1b28147004330be8582bc4cef037b0d"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"8123be77888d3931b93a240169080b0c"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"ba4753cba9dcb01d54185d737df9022c"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"dbbcf32e99677b640e0fcd3a47e919e2"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"4a43b21f2d6d6d4b421a739476ffb56e"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"997e75072d715671e971dcab91790a2f"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"a0929a6d4e53af2e91d1e2fa3352236d"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"7d9e95d85c701b19b3ffde74bc61e492"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"4ef59e87103fc7135cf294f80f850268"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"8a3fadf202fafabbfec74c9ffe35fa79"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"d1feeda8ff6b5ea2da2cec96ca40df37"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"1a64c86de8de686ff6fe953456b19a88"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"be2c65e443ede900a692180d16a41ece"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"33342bb4cb8c3f8696ad09708b14e8a9"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"18de59338f157ea98300a2ca0065154b"│"3ad74c9d10ccf828672b2ba990f915a8"│
├──────────────────────────────────┼──────────────────────────────────┤
│"3bdcc8ebbc132330a60e1c1c10b78229"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"3aa8d206df281f6a374422293918547b"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"3a6e2ff812b3e1466331e43e69d32b6a"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"4b22508297d98097d9140c021bdbe2d0"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"004d4f157df4ba07356cd805131dfc04"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"eb0f68ed206093b63ed1b91643732204"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"612e89f1439818ce514634dd3345be06"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"356c83cd07b9e9305ebaac4e4016e389"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"33667bc7203c31519c68640353b31b0f"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"193971c089ea0e0be069e09e343016ee"│"239e9468a163caab397d642847d5f893"│
├──────────────────────────────────┼──────────────────────────────────┤
│"1a41cf20fdb81bf5badbf275b4c0052b"│"239e9468a163caab397d642847d5f893"│
└──────────────────────────────────┴──────────────────────────────────┘
yuanzhou commented 1 month ago

7/19/2024 With the retry_on_conflict implemented, I was still able to reproduce this 409 issue when reindexing multiple datasets under the same donor e71689fb01e59f5f57cc3ec250ba9609. However, the error rate is noticeably lower now.

ERROR:hubmap_translator:OpenSearch message for 409 code: '
{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[e71689fb01e59f5f57cc3ec250ba9609]: version conflict, required seqNo [71538], primary term [1]. current document has seqNo [71539] and primary term [1]",
                "index_uuid": "qBHQcRrrR_-75suqjXixvQ",
                "shard": "3",
                "index": "hm_prod_consortium_entities"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[e71689fb01e59f5f57cc3ec250ba9609]: version conflict, required seqNo [71538], primary term [1]. current document has seqNo [71539] and primary term [1]",
        "index_uuid": "qBHQcRrrR_-75suqjXixvQ",
        "shard": "3",
        "index": "hm_prod_consortium_entities"
    },
    "status": 409
}
'.
yuanzhou commented 1 month ago

7/20/2024, did more testing with retry and refresh, it could still cause 409 when lots of descendants with common samples and donors are being updated against ES directly. As a result, I disabled the recent direct update work and went back to the original procedure via https://github.com/hubmapconsortium/search-api/pull/839. The PR also involves a fix to the logging too long error made in the search-adaptor via https://github.com/dbmi-pitt/search-adaptor/commit/ab6ead92f7b2bb6ab42ea6a5f1b2436b5f277d00

FYI @kburke, unfortunately this is a trial and error process. On the flip side, we have a much better understanding of the reindex procedure and limitations of Elasticsearch's Optimistic concurrency control.