wmo-im / pywcmp

pywcmp provides validation and quality assessment capabilities for the WMO WIS Core Metadata Profile (WCMP)
https://community.wmo.int/activity-areas/wis
Other
9 stars 8 forks source link

AttributeError: '_io.BufferedReader' object has no attribute 'status' #55

Closed maaikelimper closed 2 years ago

maaikelimper commented 2 years ago

When running pywcmp's "evaluate()" over a sample of sets it sometimes crashes on AttributeError: '_io.BufferedReader' object has no attribute 'status'

[2022-03-01, 07:44:17 UTC] {update_wis_metadata_analysis.py:200} INFO - (68886) Evaluating metadata-file=urn_x-wmo_md_int.wmo.wis__HJWM88EGRR.xml: 
[2022-03-01, 07:44:20 UTC] {update_wis_metadata_analysis.py:200} INFO - (68887) Evaluating metadata-file=urn_x-wmo_md_int.wmo.wis__HJWO88EGRR.xml: 
[2022-03-01, 07:44:22 UTC] {update_wis_metadata_analysis.py:200} INFO - (68888) Evaluating metadata-file=urn_x-wmo_md_int.wmo.wis__HJXA88ECMF.xml: 
[2022-03-01, 07:44:28 UTC] {taskinstance.py:1703} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    result = execute_callable(context=context)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/decorators/base.py", line 134, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 151, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 162, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/opt/airflow/dags/update_wis_metadata_analysis.py", line 203, in update_wis_metadata_kpi
    results = kpis.evaluate()
  File "/home/airflow/.local/lib/python3.7/site-packages/pywcmp/kpi.py", line 976, in evaluate
    result = getattr(self, kpi)()
  File "/home/airflow/.local/lib/python3.7/site-packages/pywcmp/kpi.py", line 593, in kpi_008
    result = check_url(link, False)
  File "/home/airflow/.local/lib/python3.7/site-packages/pywcmp/util.py", line 320, in check_url
    if response.status > 300:
  File "/usr/local/lib/python3.7/tempfile.py", line 476, in __getattr__
    a = getattr(file, name)
  File "/usr/local/lib/python3.7/tempfile.py", line 476, in __getattr__
    a = getattr(file, name)
AttributeError: '_io.BufferedReader' object has no attribute 'status'
[2022-03-01, 07:44:28 UTC] {taskinstance.py:1280} INFO - Marking task as FAILED. dag_id=update_wis_metadata, task_id=update_wis_metdata_kpi_chunk11, execution_date=20220202T000000, start_date=20220301T074254, end_date=20220301T074428
[2022-03-01, 07:44:28 UTC] {standard_task_runner.py:91} ERROR - Failed to execute job 1898 for task update_wis_metdata_kpi_chunk11
tomkralidis commented 2 years ago

@maaikelimper can you provide a test case to help reproduce?

josusky commented 2 years ago

I can confirm that product with identifier urn:x-wmo:md:int.wmo.wis::HJXA88ECMF causes such error in KPI 8 "Links health". Somehow we expected to see only HTTP/HTTPS but here it is FTP. I will have a look at it.

josusky commented 2 years ago

@maaikelimper Can you confirm that the fix resolves your issue?

maaikelimper commented 2 years ago

Hi, I can confirm the latest version of pywcmp no longer crashes on this file with this error.

I note that for KPI-8, the file with identifier=urn:x-wmo:md:int.wmo.wis::HJXA88ECMF gets a score of 6 out of a total of 14 -> It has a total of 7 links, one returns 404 and no result passes "result.get('ssl') is True". I guess this is OK, not sure if the code would need to identify sftp vs ftp ?

josusky commented 2 years ago

not sure if the code would need to identify sftp vs ftp ?

The current implementation is able to verify HTTP, HTTPS and FTP connections. Unfortunately, for SFTP we would need to use a different library. Are there any SFTP links in the metadata?

tomkralidis commented 2 years ago

fwiw, a quick grep of the WIS Metadata Catalogue dump shows some descriptions of SFTP services, but no protocol references per se.