metno / discovery-metadata-catalog-ingestor

Apache License 2.0
1 stars 1 forks source link

update container instructios #250

Open epifanio opened 19 hours ago

epifanio commented 19 hours ago

Following the README

the command:

mkdir workdir
podman run --rm -ti -p 8000:8000 -v config.yml:/config.yml:ro -v $(pwd)/workdir:/workdir localhost/dmci:latest

fails with the log:

[2024-11-25 13:43:07 +0000] [3671] [INFO] Worker exiting (pid: 3671)
ERROR    Config value 'rejected_jobs_path' must be set
ERROR    Config value 'path_to_parent_list' must be set

please advice me with instructions on how to run the DMCI container locally - will be used for testing the add_collection code into mmd files.

shamlymajeed commented 19 hours ago

rejected_jobs_path: rejected path_to_parent_list: parent-uuid-list.xml these are also needed in the config .yaml. so make a directory rejected and an empty parent-uuid-list.xml. then it will run hopefully.

epifanio commented 18 hours ago

Thanks @shamlymajeed , using the following config:

---
dmci:
  distributors:
    - file
    - pycsw
  distributor_cache: /workdir
  max_permitted_size: 100000
  mmd_xsl_path: /usr/share/mmd/xslt/mmd-to-geonorge.xsl
  mmd_xsd_path: /usr/share/mmd/xsd/mmd_strict.xsd
  rejected_jobs_path: /rejected
  path_to_parent_list: /usr/share/parent-uuid-list.xml

pycsw:
  csw_service_url: http://localhost

customization:
  catalog_url: http://localhost
  env_string: dev

file:
  file_archive_path: /workdir

the service is now running locally via:

podman run --rm -ti -p 8000:8000 -v ./config.yaml:/config.yaml:ro -v $(pwd)/workdir:/workadir -v $(pwd)/rejected:/rejected -v $(pwd)/usr/share/parent-uuid-list.xml:/usr/share/parent-uuid-list.xml localhost/dmci:latest 

I am completely new to DMCI, so I tried the following:

(base) massimods@pc5688:~/dev/WORK/DMCI_COLLECTION$ curl --data-binary "@mmd-xml-staging/arch_0/arch_0/arch_0/bf190000-db6c-4067-bd40-98580356
7ee4.xml" http://localhost:8000/v1/insert 
Namespace no.met.staging does not match the env dev
 Rejected persistent file : b6d5ee58-0613-456b-bb39-96c5e1fa8d83.xml

the system seems to work properly, those the logs service-side:

[2024-11-25 14:23:38 +0000] [7] [INFO] Worker exiting (pid: 7)
[2024-11-25 14:23:38 +0000] [61] [INFO] Booting worker with pid: 61
[2024-11-25 14:23:45 +0000] [3] [CRITICAL] WORKER TIMEOUT (pid:5)
[2024-11-25 14:23:45 +0000] [5] [INFO] Worker exiting (pid: 5)
[2024-11-25 14:23:45 +0000] [69] [INFO] Booting worker with pid: 69
[2024-11-25 14:26:47 +0000] [3] [INFO] Handling signal: winch
[2024-11-25 14:26:47 +0000] [3] [INFO] Handling signal: winch
INFO     XML file title:Meps 2.5 km surface parameters from ensemble member 13 2024-04-20T19:00:00Z + 66 hours
INFO     XML file metadata_identifier: no.met.staging:bf190000-db6c-4067-bd40-985803567ee4
INFO     Performing in depth checking.
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FAccess_Constraint
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FActivity_Type
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FOperational_Status
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FUse_Constraint

Any hints/advice/link on how to proceed with ingesting some mmd? :) -- thanks!

epifanio commented 18 hours ago

I replaced in the config from env to staging - and got rid of the following log: Namespace no.met.staging does not match the env dev Now the log says :

[2024-11-25 14:45:12 +0000] [16] [INFO] Booting worker with pid: 16
INFO     XML file title:Meps 2.5 km surface parameters from ensemble member 13 2024-04-20T19:00:00Z + 66 hours
INFO     XML file metadata_identifier: no.met.staging:bf190000-db6c-4067-bd40-985803567ee4
INFO     Performing in depth checking.
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FAccess_Constraint
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FActivity_Type
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FOperational_Status
INFO     Making API call: https://vocab.met.no/rest/v1/mmd/data?uri=https%3A%2F%2Fvocab.met.no%2Fmmd%2FUse_Constraint
INFO     Created folder: /workdir/arch_0/arch_0/arch_0
INFO     Added file: bf190000-db6c-4067-bd40-985803567ee4.xml
ERROR    Failed to translate MMD to ISO19139
ERROR    HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7a60e56e6740>: Failed to establish a new connection: [Errno 111] Connection refused'))
INFO     failed ['pycsw']

I guess it is expecting a running pycsw endpoint,

the config says:

pycsw:
  csw_service_url: http://localhost

will start adding a csw container into my local setting, quick question ..

is this used for ingesting the MMD into the pycsw catalog via "pyCSW transaction API"?

epifanio commented 16 hours ago

found it pycsw_transaction - I will then add a local service with transaction set to True

shamlymajeed commented 1 hour ago

also you can remove 'pycsw' from 'distributers' in config.yaml .