surfedushare / harvester

ETL pipeline and search engine for Edusources and Publinova
MIT License
2 stars 0 forks source link

Add seeding tests for new sources #416

Closed fako closed 1 month ago

fako commented 2 months ago

Tests need to include the following:

fako commented 2 months ago

VHL is returning empty documents. There are 5 of these as far as I can tell:

  <record>
    <header>
      <identifier>oai:www.greeni.nl:VBS:2:150433</identifier>
      <datestamp>2022-12-19</datestamp>
      <setSpec>ALLRECORDS</setSpec>
      <setSpec>PUBVHL</setSpec>
      <setSpec>VHL</setSpec>
    </header>
    <metadata>
    </metadata>
  </record>
fako commented 1 month ago

Some random fails happen in webhooks:

____ TestSharekitProductWebhook.test_update_deleted ____ [gw1] linux -- Python 3.12.3 /opt/hostedtoolcache/Python/3.12.3/x64/bin/python

self = dispatch_mock =

@patch("products.views.webhook.dispatch_document_tasks.delay")
def test_update_deleted(self, dispatch_mock):
    # Prepare update Document
    self.update_document.state = ProductDocument.States.DELETED
    self.update_document.properties["state"] = ProductDocument.States.DELETED
    self.update_document.save()
    # Execute the webhook
    update_response = self.call_webhook(self.webhook_url, verb="update")
    self.assertEqual(update_response.status_code, 200)
    update_product, update_files = self.assert_update_models()
    # Dispatch asserts
  dispatch_mock.assert_has_calls([

call("products", [update_product.id]), call("files", [update_file.id for update_file in update_files]) ])

testing/cases/webhooks/product.py:242:


self = calls = [call('products', [356]), call('files', [200, 198, 199])] any_order = False

def assert_has_calls(self, calls, any_order=False):
    """assert the mock has been called with the specified calls.
    The `mock_calls` list is checked for the calls.

    If `any_order` is False (the default) then the calls must be
    sequential. There can be extra calls before or after the
    specified calls.

    If `any_order` is True then the calls can be in any order, but
    they must all appear in `mock_calls`."""
    expected = [self._call_matcher(c) for c in calls]
    cause = next((e for e in expected if isinstance(e, Exception)), None)
    all_calls = _CallList(self._call_matcher(c) for c in self.mock_calls)
    if not any_order:
        if expected not in all_calls:
            if cause is None:
                problem = 'Calls not found.'
            else:
                problem = ('Error processing expected calls.\n'
                           'Errors: {}').format(
                               [e if isinstance(e, Exception) else None
                                for e in expected])
          raise AssertionError(

f'{problem}\n' f'Expected: {_CallList(calls)}' f'{self._calls_repr(prefix=" Actual").rstrip(".")}' ) from cause E AssertionError: Calls not found. E Expected: [call('products', [356]), call('files', [200, 198, 199])] E Actual: [call('products', [356]), call('files', [198, 199, 200])]

/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/unittest/mock.py:981: AssertionError ----------------------------- Captured stderr call ----------------------------- 2024-05-27 15:06:02,186 [INFO] documents: Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e 2024-05-27 15:06:02,208 [INFO] documents: Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:7ec8985621b50d7bf29b06cf4d413191d0a20bd4 2024-05-27 15:06:02,208 [INFO] documents: Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:339df213a16895868ba4bfc635b7d3348348e33a 2024-05-27 15:06:02,209 [INFO] documents: Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:ae362bbe89cae936c89aed50dfd6b7a1cb6bf03b ------------------------------ Captured log call ------------------------------- INFO documents:logging.py:88 Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e INFO documents:logging.py:88 Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:7ec8985621b50d7bf29b06cf4d413191d0a20bd4 INFO documents:logging.py:88 Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:339df213a16895868ba4bfc635b7d3348348e33a INFO documents:logging.py:88 Report: sharekit:edusources:63903863-6c93-4bda-b850-277f3c9ec00e:ae362bbe89cae936c89aed50dfd6b7a1cb6bf03b =========================== short test summary info ============================ FAILED core/tests/commands/test_index_dataset_version.py::TestIndexDatasetVersionFallback::test_index - django.urls.exceptions.NoReverseMatch: Reverse for 'core_collection_changelist' not found. 'core_collection_changelist' is not a valid view function or pattern name. FAILED testing/tests/webhooks/test_sharekit.py::TestSharekitProductWebhook::test_update_deleted - AssertionError: Calls not found. Expected: [call('products', [356]), call('files', [200, 198, 199])] Actual: [call('products', [356]), call('files', [198, 199, 200])]

fako commented 1 month ago

Updated documentation of the harvest process: https://github.com/surfedushare/harvester/blob/acceptance/harvester/README.md#harvesting

@peterdubbeldsurf is this clear like this?