ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Generate and review validation reports: lapl_oai -- post-mvp fixes #578

Open christinklez opened 1 year ago

christinklez commented 1 year ago
christinklez commented 11 months ago

@amywieliczka @barbarahui -- this failed at map_endpoint_task:

Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL connection has been closed unexpectedly

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 73, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 2354, in xcom_push
    XCom.set(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 73, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 264, in set
    session.flush()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
    self._flush(objects)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3588, in _flush
    with util.safe_reraise():
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
    flush_context.execute()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1097, in _emit_insert_statements
    c = connection._execute_20(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly

[SQL: INSERT INTO xcom (dag_run_id, task_id, map_index, key, dag_id, run_id, value, timestamp) VALUES (%(dag_run_id)s, %(task_id)s, %(map_index)s, %(key)s, %(dag_id)s, %(run_id)s, %(value)s, %(timestamp)s)]
[parameters: {'dag_run_id': 95, 'task_id': 'map_endpoint_task', 'map_index': -1, 'key': 'return_value', 'dag_id': 'validate_by_mapper_type', 'run_id': 'manual__2023-11-16T23:23:47+00:00', 'value': <psycopg2.extensions.Binary object at 0x7f7e935f6a30>, 'timestamp': datetime.datetime(2023, 11, 17, 3, 47, 58, 410865, tzinfo=Timezone('UTC'))}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[2023-11-17, 03:48:06 UTC] {{taskinstance.py:1345}} INFO - Marking task as FAILED. dag_id=validate_by_mapper_type, task_id=map_endpoint_task, execution_date=20231116T232347, start_date=20231117T034059, end_date=20231117T034806
[2023-11-17, 03:48:06 UTC] {{standard_task_runner.py:104}} ERROR - Failed to execute job 1984 for task map_endpoint_task ((psycopg2.OperationalError) SSL connection has been closed unexpectedly

[SQL: INSERT INTO xcom (dag_run_id, task_id, map_index, key, dag_id, run_id, value, timestamp) VALUES (%(dag_run_id)s, %(task_id)s, %(map_index)s, %(key)s, %(dag_id)s, %(run_id)s, %(value)s, %(timestamp)s)]
[parameters: {'dag_run_id': 95, 'task_id': 'map_endpoint_task', 'map_index': -1, 'key': 'return_value', 'dag_id': 'validate_by_mapper_type', 'run_id': 'manual__2023-11-16T23:23:47+00:00', 'value': <psycopg2.extensions.Binary object at 0x7f7e935f6a30>, 'timestamp': datetime.datetime(2023, 11, 17, 3, 47, 58, 410865, tzinfo=Timezone('UTC'))}]
(Background on this error at: https://sqlalche.me/e/14/e3q8); 24515)
[2023-11-17, 03:48:06 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code 1
[2023-11-17, 03:48:06 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
barbarahui commented 11 months ago

@amywieliczka @bibliotechy The error above seems to mean that the Scheduler ran out of resources:

https://docs.aws.amazon.com/mwaa/latest/userguide/t-cloudwatch-cloudtrail-logs.html#scheduler-postgres-library

This happened for collection 26094, which has 685 pages of vernacular metadata. AWS suggests increasing the number of schedulers. We currently have 2 schedulers, so I could try upping that to 3...

barbarahui commented 11 months ago

@christinklez could you try running this again from scratch and see what happens? @amywieliczka and I discussed and think that it may have one-off issue. (The mapping task in this particular DAG doesn't fan out, so resourcing shouldn't be an issue).

christinklez commented 11 months ago

@barbarahui thank you! This job went through the map_endpoint_task successfully! However, it is now running into an error at validate_endpoint_task, looks like during 26094 which is a huge collection with 135,453 items.

full log here: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/validate_by_mapper_type/grid?dag_run_id=manual__2023-11-16T23%3A23%3A47%2B00%3A00&task_id=validate_endpoint_task&tab=logs

*** Reading remote log from Cloudwatch log_group: airflow-pad-airflow-mwaa-Task log_stream: dag_id=validate_by_mapper_type/run_id=manual__2023-11-16T23_23_47+00_00/task_id=validate_endpoint_task/attempt=2.log.
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 2 of 2
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1327}} INFO - Executing <Task(_PythonDecoratedOperator): validate_endpoint_task> on 2023-11-16 23:23:47+00:00
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:57}} INFO - Started process 25465 to run task
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:84}} INFO - Running: ['airflow', 'tasks', 'run', 'validate_by_mapper_type', 'validate_endpoint_task', 'manual__2023-11-16T23:23:47+00:00', '--job-id', '2115', '--raw', '--subdir', 'DAGS_FOLDER/rikolti/dags/validate_by_mapper_type.py', '--cfg-path', '/tmp/tmpn2_buk76']
[2023-11-30, 17:10:12 UTC] {{standard_task_runner.py:85}} INFO - Job 2115: Subtask validate_endpoint_task
[2023-11-30, 17:10:12 UTC] {{task_command.py:410}} INFO - Running <TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [running]> on host ip-10-192-21-64.us-west-2.compute.internal
[2023-11-30, 17:10:12 UTC] {{taskinstance.py:1545}} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='validate_by_mapper_type' AIRFLOW_CTX_TASK_ID='validate_endpoint_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-16T23:23:47+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-16T23:23:47+00:00'
[2023-11-30, 17:10:12 UTC] {{logging_mixin.py:150}} INFO - >>> Validating 10/10 collections described at https://registry.cdlib.org/api/v1/rikoltifetcher/?format=json&mapper_type=lapl_oai&ready_for_publication=true
[2023-11-30, 17:10:12 UTC] {{logging_mixin.py:150}} INFO - 26094  Validating collection
[2023-11-30, 17:22:33 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code Negsignal.SIGKILL
[2023-11-30, 17:22:33 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
barbarahui commented 11 months ago

@amywieliczka @bibliotechy Looks like the worker runs out of memory for this huge collection: https://stackoverflow.com/questions/69231797/airflow-dag-fails-when-pythonoperator-with-error-negsignal-sigkill

We're currently on mw1.small which provides 2G of RAM. mw.medium would up it to 4G: https://docs.aws.amazon.com/mwaa/latest/userguide/environment-class.html#environment-class-sizes

barbarahui commented 11 months ago

OK, I bumped our MWAA instance to mw1.medium. @christinklez can you try running this again to see if that resolves the issue?

christinklez commented 11 months ago

@barbarahui thank you for that! Just re-ran this and it looks like we're hitting the same error 🤕

https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/validate_by_mapper_type/grid?dag_run_id=manual__2023-11-16T23%3A23%3A47%2B00%3A00&task_id=validate_endpoint_task&tab=logs

ip-10-192-21-247.us-west-2.compute.internal
*** Reading remote log from Cloudwatch log_group: airflow-pad-airflow-mwaa-Task log_stream: dag_id=validate_by_mapper_type/run_id=manual__2023-11-16T23_23_47+00_00/task_id=validate_endpoint_task/attempt=3.log.
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1103}} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [queued]>
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 3 of 3
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1327}} INFO - Executing <Task(_PythonDecoratedOperator): validate_endpoint_task> on 2023-11-16 23:23:47+00:00
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:57}} INFO - Started process 301 to run task
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:84}} INFO - Running: ['airflow', 'tasks', 'run', 'validate_by_mapper_type', 'validate_endpoint_task', 'manual__2023-11-16T23:23:47+00:00', '--job-id', '2148', '--raw', '--subdir', 'DAGS_FOLDER/rikolti/dags/validate_by_mapper_type.py', '--cfg-path', '/tmp/tmp6wqb0edx']
[2023-12-01, 23:11:26 UTC] {{standard_task_runner.py:85}} INFO - Job 2148: Subtask validate_endpoint_task
[2023-12-01, 23:11:26 UTC] {{task_command.py:410}} INFO - Running <TaskInstance: validate_by_mapper_type.validate_endpoint_task manual__2023-11-16T23:23:47+00:00 [running]> on host ip-10-192-21-247.us-west-2.compute.internal
[2023-12-01, 23:11:26 UTC] {{taskinstance.py:1545}} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='validate_by_mapper_type' AIRFLOW_CTX_TASK_ID='validate_endpoint_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-16T23:23:47+00:00' AIRFLOW_CTX_TRY_NUMBER='3' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-16T23:23:47+00:00'
[2023-12-01, 23:11:26 UTC] {{logging_mixin.py:150}} INFO - >>> Validating 10/10 collections described at https://registry.cdlib.org/api/v1/rikoltifetcher/?format=json&mapper_type=lapl_oai&ready_for_publication=true
[2023-12-01, 23:11:26 UTC] {{logging_mixin.py:150}} INFO - 26094  Validating collection
[2023-12-01, 23:23:24 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code Negsignal.SIGKILL
[2023-12-01, 23:23:24 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
christinklez commented 11 months ago

Kicked off a new job to test out what happens with the supersized collection 26094 off the list of collections to run through the validator.

Job link: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/validate_by_mapper_type/grid?dag_run_id=manual__2023-12-02T00%3A23%3A15%2B00%3A00

christinklez commented 11 months ago

Validation reports (without 26094) ran through. We can start reviewing these reports and come back to review 26094 once the error issue becomes clearer.

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27219/vernacular_metadata_2023-12-02T00:23:49/mapped_metadata_2023-12-02T00:50:52/validation_2023-12-02T00:51:45.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27219/vernacular_metadata_2023-12-02T00:23:49/mapped_metadata_2023-12-02T00:50:52/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27220/vernacular_metadata_2023-12-02T00:26:53/mapped_metadata_2023-12-02T00:50:58/validation_2023-12-02T00:51:57.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27220/vernacular_metadata_2023-12-02T00:26:53/mapped_metadata_2023-12-02T00:50:58/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27221/vernacular_metadata_2023-12-02T00:33:37/mapped_metadata_2023-12-02T00:51:08/validation_2023-12-02T00:51:58.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27221/vernacular_metadata_2023-12-02T00:33:37/mapped_metadata_2023-12-02T00:51:08/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27222/vernacular_metadata_2023-12-02T00:34:03/mapped_metadata_2023-12-02T00:51:08/validation_2023-12-02T00:51:58.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27222/vernacular_metadata_2023-12-02T00:34:03/mapped_metadata_2023-12-02T00:51:08/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27223/vernacular_metadata_2023-12-02T00:34:07/mapped_metadata_2023-12-02T00:51:09/validation_2023-12-02T00:52:13.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27223/vernacular_metadata_2023-12-02T00:34:07/mapped_metadata_2023-12-02T00:51:09/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27224/vernacular_metadata_2023-12-02T00:41:26/mapped_metadata_2023-12-02T00:51:19/validation_2023-12-02T00:52:15.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27224/vernacular_metadata_2023-12-02T00:41:26/mapped_metadata_2023-12-02T00:51:19/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27225/vernacular_metadata_2023-12-02T00:42:10/mapped_metadata_2023-12-02T00:51:21/validation_2023-12-02T00:52:15.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27225/vernacular_metadata_2023-12-02T00:42:10/mapped_metadata_2023-12-02T00:51:21/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27226/vernacular_metadata_2023-12-02T00:42:13/mapped_metadata_2023-12-02T00:51:22/validation_2023-12-02T00:52:16.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27226/vernacular_metadata_2023-12-02T00:42:13/mapped_metadata_2023-12-02T00:51:22/data/

[2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27227/vernacular_metadata_2023-12-02T00:42:29/mapped_metadata_2023-12-02T00:51:22/validation_2023-12-02T00:52:34.csv [2023-12-02, 00:52:34 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27227/vernacular_metadata_2023-12-02T00:42:29/mapped_metadata_2023-12-02T00:51:22/data/

christinklez commented 11 months ago

Updated the registry to add the LAPL #26094 to lapl_oai.

aturner commented 11 months ago

Updated spreadsheet with initial validation report QA notes, for next CK/GM/AT collective review and synthesis: https://docs.google.com/spreadsheets/d/1XkBwi8jiuGgrWvQBqjkrbavSl13yyrQNyRyEb_CQ4z0/edit?usp=sharing

christinklez commented 10 months ago

Validation fixes requested; see #671

christinklez commented 9 months ago

27219, 27220, 27221, 27223 have validation errors:

[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ------------------------------------------------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - -------------------- Validation Errors ---------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ------------------------------------------------------------
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -  Collection 27219: No mapped metadata found for 27219 page 27219/vernacular_metadata_2024-01-27T04:04:44/mapped_metadata_2024-01-27T04:32:29/data/0.jsonl. Aborting. 
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
    num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
    result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
    rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
    raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27219 page 27219/vernacular_metadata_2024-01-27T04:04:44/mapped_metadata_2024-01-27T04:32:29/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -  Collection 27220: No mapped metadata found for 27220 page 27220/vernacular_metadata_2024-01-27T04:06:22/mapped_metadata_2024-01-27T04:32:34/data/0.jsonl. Aborting. 
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
    num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
    result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
    rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
    raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27220 page 27220/vernacular_metadata_2024-01-27T04:06:22/mapped_metadata_2024-01-27T04:32:34/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -  Collection 27221: No mapped metadata found for 27221 page 27221/vernacular_metadata_2024-01-27T04:10:59/mapped_metadata_2024-01-27T04:32:43/data/0.jsonl. Aborting. 
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
    num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
    result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
    rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
    raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27221 page 27221/vernacular_metadata_2024-01-27T04:10:59/mapped_metadata_2024-01-27T04:32:43/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -  Collection 27223: No mapped metadata found for 27223 page 27223/vernacular_metadata_2024-01-27T04:11:24/mapped_metadata_2024-01-27T04:32:45/data/0.jsonl. Aborting. 
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - Traceback (most recent call last):
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/dags/utils_by_mapper_type.py", line 171, in validate_endpoint_task
    num_rows, version_page = create_collection_validation_csv(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 217, in create_collection_validation_csv
    result = validate_collection(collection_id, mapped_page_paths, **options)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 66, in validate_collection
    rikolti_ids, new_ids = validate_page(collection_id, page_path, validator)
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING -   File "/usr/local/airflow/dags/rikolti/metadata_mapper/validate_mapping.py", line 171, in validate_page
    raise ValueError(
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - ValueError: No mapped metadata found for 27223 page 27223/vernacular_metadata_2024-01-27T04:11:24/mapped_metadata_2024-01-27T04:32:45/data/0.jsonl. Aborting.
[2024-01-27, 04:45:34 UTC] {{logging_mixin.py:150}} WARNING - <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
christinklez commented 9 months ago

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/26094/vernacular_metadata_2024-02-03T00:47:50/mapped_metadata_2024-02-03T05:16:00/validation_2024-02-03T05:33:49.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#26094/vernacular_metadata_2024-02-03T00:47:50/mapped_metadata_2024-02-03T05:16:00/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27219/vernacular_metadata_2024-02-03T04:55:12/mapped_metadata_2024-02-03T05:21:20/validation_2024-02-03T05:33:57.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27219/vernacular_metadata_2024-02-03T04:55:12/mapped_metadata_2024-02-03T05:21:20/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27220/vernacular_metadata_2024-02-03T04:57:17/mapped_metadata_2024-02-03T05:21:24/validation_2024-02-03T05:34:11.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27220/vernacular_metadata_2024-02-03T04:57:17/mapped_metadata_2024-02-03T05:21:24/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27221/vernacular_metadata_2024-02-03T05:03:19/mapped_metadata_2024-02-03T05:21:33/validation_2024-02-03T05:34:13.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27221/vernacular_metadata_2024-02-03T05:03:19/mapped_metadata_2024-02-03T05:21:33/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27222/vernacular_metadata_2024-02-03T05:03:37/mapped_metadata_2024-02-03T05:21:34/validation_2024-02-03T05:34:13.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27222/vernacular_metadata_2024-02-03T05:03:37/mapped_metadata_2024-02-03T05:21:34/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27223/vernacular_metadata_2024-02-03T05:03:40/mapped_metadata_2024-02-03T05:21:35/validation_2024-02-03T05:34:26.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27223/vernacular_metadata_2024-02-03T05:03:40/mapped_metadata_2024-02-03T05:21:35/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27224/vernacular_metadata_2024-02-03T05:08:50/mapped_metadata_2024-02-03T05:21:44/validation_2024-02-03T05:34:28.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27224/vernacular_metadata_2024-02-03T05:08:50/mapped_metadata_2024-02-03T05:21:44/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27225/vernacular_metadata_2024-02-03T05:09:14/mapped_metadata_2024-02-03T05:21:46/validation_2024-02-03T05:34:28.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27225/vernacular_metadata_2024-02-03T05:09:14/mapped_metadata_2024-02-03T05:21:46/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27226/vernacular_metadata_2024-02-03T05:09:17/mapped_metadata_2024-02-03T05:21:46/validation_2024-02-03T05:34:29.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27226/vernacular_metadata_2024-02-03T05:09:17/mapped_metadata_2024-02-03T05:21:46/data/

[2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Download validation report at: https://rikolti-data.s3.amazonaws.com/27227/vernacular_metadata_2024-02-03T05:09:26/mapped_metadata_2024-02-03T05:21:47/validation_2024-02-03T05:34:48.csv [2024-02-03, 05:34:48 UTC] {{logging_mixin.py:150}} INFO - Review collection data at: https://rikolti-data.s3.us-west-2.amazonaws.com/index.html#27227/vernacular_metadata_2024-02-03T05:09:26/mapped_metadata_2024-02-03T05:21:47/data/