Open christinklez opened 1 year ago
@amywieliczka @barbarahui -- this failed at map_endpoint_task
:
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL connection has been closed unexpectedly
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 73, in wrapper
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 2354, in xcom_push
XCom.set(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 73, in wrapper
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 264, in set
session.flush()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
self._flush(objects)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3588, in _flush
with util.safe_reraise():
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
flush_context.execute()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
rec.execute(self)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
util.preloaded.orm_persistence.save_obj(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
_emit_insert_statements(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1097, in _emit_insert_statements
c = connection._execute_20(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
util.raise_(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly
[SQL: INSERT INTO xcom (dag_run_id, task_id, map_index, key, dag_id, run_id, value, timestamp) VALUES (%(dag_run_id)s, %(task_id)s, %(map_index)s, %(key)s, %(dag_id)s, %(run_id)s, %(value)s, %(timestamp)s)]
[parameters: {'dag_run_id': 95, 'task_id': 'map_endpoint_task', 'map_index': -1, 'key': 'return_value', 'dag_id': 'validate_by_mapper_type', 'run_id': 'manual__2023-11-16T23:23:47+00:00', 'value': <psycopg2.extensions.Binary object at 0x7f7e935f6a30>, 'timestamp': datetime.datetime(2023, 11, 17, 3, 47, 58, 410865, tzinfo=Timezone('UTC'))}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[2023-11-17, 03:48:06 UTC] {{taskinstance.py:1345}} INFO - Marking task as FAILED. dag_id=validate_by_mapper_type, task_id=map_endpoint_task, execution_date=20231116T232347, start_date=20231117T034059, end_date=20231117T034806
[2023-11-17, 03:48:06 UTC] {{standard_task_runner.py:104}} ERROR - Failed to execute job 1984 for task map_endpoint_task ((psycopg2.OperationalError) SSL connection has been closed unexpectedly
[SQL: INSERT INTO xcom (dag_run_id, task_id, map_index, key, dag_id, run_id, value, timestamp) VALUES (%(dag_run_id)s, %(task_id)s, %(map_index)s, %(key)s, %(dag_id)s, %(run_id)s, %(value)s, %(timestamp)s)]
[parameters: {'dag_run_id': 95, 'task_id': 'map_endpoint_task', 'map_index': -1, 'key': 'return_value', 'dag_id': 'validate_by_mapper_type', 'run_id': 'manual__2023-11-16T23:23:47+00:00', 'value': <psycopg2.extensions.Binary object at 0x7f7e935f6a30>, 'timestamp': datetime.datetime(2023, 11, 17, 3, 47, 58, 410865, tzinfo=Timezone('UTC'))}]
(Background on this error at: https://sqlalche.me/e/14/e3q8); 24515)
[2023-11-17, 03:48:06 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code 1
[2023-11-17, 03:48:06 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
@amywieliczka @bibliotechy The error above seems to mean that the Scheduler ran out of resources:
This happened for collection 26094, which has 685 pages of vernacular metadata. AWS suggests increasing the number of schedulers. We currently have 2 schedulers, so I could try upping that to 3...
@christinklez could you try running this again from scratch and see what happens? @amywieliczka and I discussed and think that it may have one-off issue. (The mapping task in this particular DAG doesn't fan out, so resourcing shouldn't be an issue).
@barbarahui thank you! This job went through the map_endpoint_task
successfully! However, it is now running into an error at validate_endpoint_task
, looks like during 26094 which is a huge collection with 135,453 items.
*** Reading remote log from Cloudwatch log_group: airflow-pad-airflow-mwaa-Task log_stream: dag_id=validate_by_mapper_type/run_id=manual__2023-11-16T23_23_47+00_00/task_id=validate_endpoint_task/attempt=2.log.
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 2 of 2
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1327}} INFO - Executing <Task(_PythonDecoratedOperator): validate_endpoint_task> on 2023-11-16 23:23:47+00:00
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:57}} INFO - Started process 25465 to run task
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:84}} INFO - Running: ['airflow', 'tasks', 'run', 'validate_by_mapper_type', 'validate_endpoint_task', 'manual__2023-11-16T23:23:47+00:00', '--job-id', '2115', '--raw', '--subdir', 'DAGS_FOLDER/rikolti/dags/validate_by_mapper_type.py', '--cfg-path', '/tmp/tmpn2_buk76']
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:85}} INFO - Job 2115: Subtask validate_endpoint_task
[2023-11-30, 17:10:12 UTC] {{task_command.py:410}} INFO - Running <TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [running]> on host ip-10-192-21-64.us-west-2.compute.internal
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1545}} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='validate_by_mapper_type' AIRFLOW_CTX_TASK_ID='validate_endpoint_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-16T23:23:47+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-16T23:23:47+00:00'
[2023-11-30, 17:10:12 UTC] {{logging_mixin.py:150}} INFO - >>> Validating 10/10 collections described at https://registry.cdlib.org/api/v1/rikoltifetcher/?format=json&mapper_type=lapl_oai&ready_for_publication=true
[2023-11-30, 17:10:12 UTC] {{logging_mixin.py:150}} INFO - 26094 Validating collection
[2023-11-30, 17:22:33 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code Negsignal.SIGKILL
[2023-11-30, 17:22:33 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
@amywieliczka @bibliotechy Looks like the worker runs out of memory for this huge collection: https://stackoverflow.com/questions/69231797/airflow-dag-fails-when-pythonoperator-with-error-negsignal-sigkill
We're currently on mw1.small
which provides 2G of RAM. mw.medium
would up it to 4G: https://docs.aws.amazon.com/mwaa/latest/userguide/environment-class.html#environment-class-sizes
OK, I bumped our MWAA instance to mw1.medium. @christinklez can you try running this again to see if that resolves the issue?
@barbarahui thank you for that! Just re-ran this and it looks like we're hitting the same error 🤕
ip-10-192-21-247.us-west-2.compute.internal
*** Reading remote log from Cloudwatch log_group: airflow-pad-airflow-mwaa-Task log_stream: dag_id=validate_by_mapper_type/run_id=manual__2023-11-16T23_23_47+00_00/task_id=validate_endpoint_task/attempt=3.log.
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 3 of 3
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1327}} INFO - Executing <Task(_PythonDecoratedOperator): validate_endpoint_task> on 2023-11-16 23:23:47+00:00
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:57}} INFO - Started process 301 to run task
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:84}} INFO - Running: ['airflow', 'tasks', 'run', 'validate_by_mapper_type', 'validate_endpoint_task', 'manual__2023-11-16T23:23:47+00:00', '--job-id', '2148', '--raw', '--subdir', 'DAGS_FOLDER/rikolti/dags/validate_by_mapper_type.py', '--cfg-path', '/tmp/tmp6wqb0edx']
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:85}} INFO - Job 2148: Subtask validate_endpoint_task
[2023-12-01, 23:11:26 UTC] {{task_command.py:410}} INFO - Running <TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [running]> on host ip-10-192-21-247.us-west-2.compute.internal
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1545}} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='validate_by_mapper_type' AIRFLOW_CTX_TASK_ID='validate_endpoint_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-16T23:23:47+00:00' AIRFLOW_CTX_TRY_NUMBER='3' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-16T23:23:47+00:00'
[2023-12-01, 23:11:26 UTC] {{logging_mixin.py:150}} INFO - >>> Validating 10/10 collections described at https://registry.cdlib.org/api/v1/rikoltifetcher/?format=json&mapper_type=lapl_oai&ready_for_publication=true
[2023-12-01, 23:11:26 UTC] {{logging_mixin.py:150}} INFO - 26094 Validating collection
[2023-12-01, 23:23:24 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code Negsignal.SIGKILL
[2023-12-01, 23:23:24 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
Kicked off a new job to test out what happens with the supersized collection 26094 off the list of collections to run through the validator.
Validation reports (without 26094) ran through. We can start reviewing these reports and come back to review 26094 once the error issue becomes clearer.
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27219/vernacular_metadata_2023-12-02T00:23:49/mapped_metadata_2023-12-02T00:50:52/validation_2023-12-02T00:51:45.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27219/vernacular_metadata_2023-12-02T00:23:49/mapped_metadata_2023-12-02T00:50:52/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27220/vernacular_metadata_2023-12-02T00:26:53/mapped_metadata_2023-12-02T00:50:58/validation_2023-12-02T00:51:57.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27220/vernacular_metadata_2023-12-02T00:26:53/mapped_metadata_2023-12-02T00:50:58/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27221/vernacular_metadata_2023-12-02T00:33:37/mapped_metadata_2023-12-02T00:51:08/validation_2023-12-02T00:51:58.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27221/vernacular_metadata_2023-12-02T00:33:37/mapped_metadata_2023-12-02T00:51:08/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27222/vernacular_metadata_2023-12-02T00:34:03/mapped_metadata_2023-12-02T00:51:08/validation_2023-12-02T00:51:58.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27222/vernacular_metadata_2023-12-02T00:34:03/mapped_metadata_2023-12-02T00:51:08/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27223/vernacular_metadata_2023-12-02T00:34:07/mapped_metadata_2023-12-02T00:51:09/validation_2023-12-02T00:52:13.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27223/vernacular_metadata_2023-12-02T00:34:07/mapped_metadata_2023-12-02T00:51:09/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27224/vernacular_metadata_2023-12-02T00:41:26/mapped_metadata_2023-12-02T00:51:19/validation_2023-12-02T00:52:15.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27224/vernacular_metadata_2023-12-02T00:41:26/mapped_metadata_2023-12-02T00:51:19/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27225/vernacular_metadata_2023-12-02T00:42:10/mapped_metadata_2023-12-02T00:51:21/validation_2023-12-02T00:52:15.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27225/vernacular_metadata_2023-12-02T00:42:10/mapped_metadata_2023-12-02T00:51:21/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27226/vernacular_metadata_2023-12-02T00:42:13/mapped_metadata_2023-12-02T00:51:22/validation_2023-12-02T00:52:16.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27226/vernacular_metadata_2023-12-02T00:42:13/mapped_metadata_2023-12-02T00:51:22/data/
[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27227/vernacular_metadata_2023-12-02T00:42:29/mapped_metadata_2023-12-02T00:51:22/validation_2023-12-02T00:52:34.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27227/vernacular_metadata_2023-12-02T00:42:29/mapped_metadata_2023-12-02T00:51:22/data/
Updated the registry to add the LAPL #26094 to lapl_oai.
Updated spreadsheet with initial validation report QA notes, for next CK/GM/AT collective review and synthesis: https://docs.google.com/spreadsheets/d/1XkBwi8jiuGgrWvQBqjkrbavSl13yyrQNyRyEb_CQ4z0/edit?usp=sharing
Validation fixes requested; see #671
27219, 27220, 27221, 27223 have validation errors:
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ------------------------------------------------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - -------------------- Validation Errors ---------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ------------------------------------------------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Collection 27219: No mapped metadata found for 27219 page 27219/vernacular_metadata_2024-01-27T04:04:44/mapped_metadata_2024-01-27T04:32:29/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27219 page 27219/vernacular_metadata_2024-01-27T04:04:44/mapped_metadata_2024-01-27T04:32:29/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Collection 27220: No mapped metadata found for 27220 page 27220/vernacular_metadata_2024-01-27T04:06:22/mapped_metadata_2024-01-27T04:32:34/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27220 page 27220/vernacular_metadata_2024-01-27T04:06:22/mapped_metadata_2024-01-27T04:32:34/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Collection 27221: No mapped metadata found for 27221 page 27221/vernacular_metadata_2024-01-27T04:10:59/mapped_metadata_2024-01-27T04:32:43/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27221 page 27221/vernacular_metadata_2024-01-27T04:10:59/mapped_metadata_2024-01-27T04:32:43/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Collection 27223: No mapped metadata found for 27223 page 27223/vernacular_metadata_2024-01-27T04:11:24/mapped_metadata_2024-01-27T04:32:45/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27223 page 27223/vernacular_metadata_2024-01-27T04:11:24/mapped_metadata_2024-01-27T04:32:45/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/26094/vernacular_metadata_2024-02-03T00:47:50/mapped_metadata_2024-02-03T05:16:00/validation_2024-02-03T05:33:49.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#26094/vernacular_metadata_2024-02-03T00:47:50/mapped_metadata_2024-02-03T05:16:00/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27219/vernacular_metadata_2024-02-03T04:55:12/mapped_metadata_2024-02-03T05:21:20/validation_2024-02-03T05:33:57.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27219/vernacular_metadata_2024-02-03T04:55:12/mapped_metadata_2024-02-03T05:21:20/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27220/vernacular_metadata_2024-02-03T04:57:17/mapped_metadata_2024-02-03T05:21:24/validation_2024-02-03T05:34:11.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27220/vernacular_metadata_2024-02-03T04:57:17/mapped_metadata_2024-02-03T05:21:24/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27221/vernacular_metadata_2024-02-03T05:03:19/mapped_metadata_2024-02-03T05:21:33/validation_2024-02-03T05:34:13.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27221/vernacular_metadata_2024-02-03T05:03:19/mapped_metadata_2024-02-03T05:21:33/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27222/vernacular_metadata_2024-02-03T05:03:37/mapped_metadata_2024-02-03T05:21:34/validation_2024-02-03T05:34:13.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27222/vernacular_metadata_2024-02-03T05:03:37/mapped_metadata_2024-02-03T05:21:34/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27223/vernacular_metadata_2024-02-03T05:03:40/mapped_metadata_2024-02-03T05:21:35/validation_2024-02-03T05:34:26.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27223/vernacular_metadata_2024-02-03T05:03:40/mapped_metadata_2024-02-03T05:21:35/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27224/vernacular_metadata_2024-02-03T05:08:50/mapped_metadata_2024-02-03T05:21:44/validation_2024-02-03T05:34:28.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27224/vernacular_metadata_2024-02-03T05:08:50/mapped_metadata_2024-02-03T05:21:44/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27225/vernacular_metadata_2024-02-03T05:09:14/mapped_metadata_2024-02-03T05:21:46/validation_2024-02-03T05:34:28.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27225/vernacular_metadata_2024-02-03T05:09:14/mapped_metadata_2024-02-03T05:21:46/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27226/vernacular_metadata_2024-02-03T05:09:17/mapped_metadata_2024-02-03T05:21:46/validation_2024-02-03T05:34:29.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27226/vernacular_metadata_2024-02-03T05:09:17/mapped_metadata_2024-02-03T05:21:46/data/
[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27227/vernacular_metadata_2024-02-03T05:09:26/mapped_metadata_2024-02-03T05:21:47/validation_2024-02-03T05:34:48.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27227/vernacular_metadata_2024-02-03T05:09:26/mapped_metadata_2024-02-03T05:21:47/data/
validate_by_mapper DAG
): https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/validate_by_mapper_type/grid?dag_run_id=manual__2023-11-16T23%3A23%3A47%2B00%3A00