Closed boogheta closed 2 years ago
New occurrence met yesterday:
2022-07-06 18:23:28,075 - depiler [306600] - ERROR - <class 'elasticsearch.helpers.errors.BulkIndexError'>: ('1 document(s) failed to index.', [{'update': {'_index': 'gazouilloire-deputes_tweets_2022_07', '_type': '_doc', '_id': '1544674616802672641', 'status': 409, 'error': {'type': 'version_conflict_engine_exception', 'reason': '[1544674616802672641]: version conflict, required seqNo [2783314], primary term [1]. current document has seqNo [2783550] and primary term [1]', 'index_uuid': 'YVqqbyNMQZGnaBiCKY_ZVw', 'shard': '0', 'index': 'gazouilloire-deputes_tweets_2022_07'}, 'data': {'script': {'source': 'ctx._source.match_query |= params.match_query; ctx._source.retweet_count = params.retweet_count; ctx._source.reply_count = params.reply_count; ctx._source.favorite_count = params.favorite_count; if (!ctx._source.collected_via.contains(params.collected_via)){ctx._source.collected_via.add(params.collected_via)}', 'lang': 'painless', 'params': {'collected_via': 'retweet', 'match_query': True, 'retweet_count': 20, 'reply_count': 4, 'like_count': 33}}, 'upsert': {'local_time': '2022-07-06T15:28:21', 'timestamp_utc': 1657114101, 'text': '#AssembleeNationale : \nAprès France Connect, France Travail, Elisabeth #Borne invente France Discours. \nFace aux crises démocratiques, sociales et écologiques des réponses aussi creuses qu’un numéro vert.', 'url': 'https://twitter.com/FraPiquemal/status/1544674616802672641', 'quoted_id': None, 'quoted_user': None, 'quoted_user_id': None, 'quoted_timestamp_utc': None, 'retweeted_id': None, 'retweeted_user': None, 'retweeted_user_id': None, 'retweeted_timestamp_utc': None, 'media_files': [], 'media_types': [], 'media_urls': [], 'links': [], 'links_to_resolve': False, 'domains': [], 'hashtags': ['assembleenationale', 'borne'], 'mentioned_ids': [], 'mentioned_names': [], 'collection_time': '2022-07-06T18:23:26.154209', 'match_query': True, 'collected_via': ['retweet'], 'coordinates': None, 'to_tweetid': None, 'to_username': None, 'to_userid': None, 'lang': 'fr', 'retweet_count': 20, 'like_count': 33, 'reply_count': 4, 'user_screen_name': 'FraPiquemal', 'user_name': 'François Piquemal', 'user_friends': 1052, 'user_followers': 5560, 'user_location': 'Toulouse, France', 'user_verified': False, 'user_description': "Député #circo3104 #Toulouse @NUPES_2022_ /@ParlementNUPES /Prof d'Hist-Géo au Mirail/ Conseiller Municipal / Co-président @GroupeAMC /10 ans à @federationdal", 'user_created_at': '2013-07-18T18:06:08', 'user_id': '1603776488', 'user_tweets': 6511, 'user_likes': 13518, 'user_lists': 98, 'user_image': 'https://pbs.twimg.com/profile_images/1529570905566892032/Og0YbKAh_normal.jpg', 'user_url': 'http://francoispiquemal.fr/', 'user_timestamp_utc': 1374163568, 'source_url': 'http://twitter.com/download/iphone', 'source_name': 'Twitter for iPhone'}}}}])
It seems like it might come from my concurrent use of the same ES index in two collects (cf https://stackoverflow.com/questions/68834219/how-to-solve-version-conflict-engine-exception-in-elasticsearch-exception), just retrying after a sec should solve the problem, I'll submit a proposal fix in a bit
We encountered this log in last night's run (I removed from the log the upsert tweet payload):
2022-06-20 05:38:17,243 - depiler [5784] - ERROR - <class 'elasticsearch.helpers.errors.BulkIndexError'>: ('1 document(s) failed to index.', [{'update': {'_index': 'multiindex_filter_links_tweets_2022_06', '_type': '_doc', '_id': '1537145721442410497', 'status': 409, 'error': {'type': 'version_conflict_engine_exception', 'reason': '[1537145721442410497]: version conflict, required seqNo [171136066], primary term [1]. current document has seqNo [171160925] and primary term [1]', 'index_uuid': '6_8gwNMrQN65gOCuRHEYwg', 'shard': '0', 'index': 'multiindex_filter_links_tweets_2022_06'}, 'data': {'script': {'source': 'ctx._source.match_query |= params.match_query; ctx._source.retweet_count = params.retweet_count; ctx._source.favorite_count = params.favorite_count; if (!ctx._source.collected_via.contains(params.collected_via)){ctx._source.collected_via.add(params.collected_via)}', 'lang': 'painless', 'params': {'collected_via': 'quote', 'match_query': False, 'retweet_count': 14615, 'reply_count': None, 'like_count': 91158}}}}])
It looks like all processes were declared stopped after this crash in the log, although the processes were still running and increasing in ram (like if data collection was continuing to fill the queue not being depiled). maybe there are some border elastic crashes to better catch in such cases