netwerk-digitaal-erfgoed / ld-workbench

A CLI tool for transforming large RDF datasets using pure SPARQL.
4 stars 1 forks source link

Unable to run configurations outside the core repository #71

Closed wouterbeek closed 3 months ago

wouterbeek commented 3 months ago

Observation

I am unable to run configurations from the <> repo. Configurations from the core repo do work as expected.

For example:

 npm run ld-workbench -- --configDir ..\ld-workbench-configuration\ao

> @netwerk-digitaal-erfgoed/ld-workbench@0.0.0-development ld-workbench
> node build/main --configDir ..\ld-workbench-configuration\ao

Welcome to LD Workbench version 0.0.0-development
Error in pipeline AO
Error in the iterator of stage `Events`: File not found: static/ao/events-iterator.rq
Error in pipeline AO
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "Error in pipeline AO".] {
  code: 'ERR_UNHANDLED_REJECTION'
}

Node.js v20.14.0

The example from the core repo does work:

npm run ld-workbench -- --configDir .\static\example\

> @netwerk-digitaal-erfgoed/ld-workbench@0.0.0-development ld-workbench
> node build/main --configDir .\static\example\

Welcome to LD Workbench version 0.0.0-development
🏁 starting pipeline "Example Pipeline"
√ validating pipeline
√ stage "Stage 1" resulted in 612 statements in 153 iterations.
√ stage "Stage 2" resulted in 105 statements in 1 iteration.
√ Writing results to destination pipelines/data/example-pipeline.nt
✔ your pipeline "Example Pipeline" was completed in 5.1s using 53 MB of memory.

Expected

Guidance in how to run configurations that are not in the core repository.

ddeboer commented 3 months ago

This is caused by the fact that file:// is relative to the directory where you’re running, not the config.yml file. The latter would seem to be more intuitive. @mightymax What do you think?

mightymax commented 3 months ago

I think relative paths should always be relative to the current working dir, all standard cli tool work like that. Are you sure this does not work as expected? Is there indeed a ".." directory either that name? Or is the bug that the current working directory is not calculated correctly?

ddeboer commented 3 months ago

The problem is not that the config.yml file cannot be found, but that the SPARQL query files referenced from that config.yml don’t resolve.

For example:

query: file://static/ao/events-iterator.rq

That static/... path depends on the working dir, which is confusing. That leads people to write:

query: file://../ld-workbench-configuration/nafotos-test/iterator-stage-1.rq

In my opinion, it’s more intuitive if it relates to the directory that the config.yml is in, so one could write either:

query: file://events-iterator.rq

or:

query: file://./events-iterator.rq

This makes the configurations more portable: any config directory will work anywhere. Users also wouldn’t have to rewrite file:.//... paths if they rename the top-level dir (nafotos-test --> nafotos-test2).

mightymax commented 3 months ago

Ah, clear. I sort of remembered that it work like that: relative paths to config.yml, but I might be wrong. If we are sure the paths in Wouter's config are not wrong, then I fully agree: when someone uses a relative path it should be relative to the used config dir.

wouterbeek commented 3 months ago

@mightymax Good point, I ran the following to double-check that the path does indeed exist:

PS C:\Users\woute\Git\ld-workbench> ls ..\ld-workbench-configuration\ao

    Directory: C:\Users\woute\Git\ld-workbench-configuration\ao

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          2024-06-03    21:21           2836 config.yml
-a---          2024-06-03    21:21           4005 events-generator.rq
-a---          2024-06-03    21:21            180 events-iterator.rq
-a---          2024-06-03    21:21           5855 makers-generator.rq
-a---          2024-06-03    21:21            216 makers-iterator.rq
-a---          2024-06-03    21:21           4028 objectDates-generator.rq
-a---          2024-06-03    21:21            181 objectDates-iterator.rq
-a---          2024-06-03    21:21           3004 objectImages-generator.rq
-a---          2024-06-03    21:21            174 objectImages-iterator.rq
-a---          2024-06-03    21:21           4838 objectLocationDates-generator.rq
-a---          2024-06-03    21:21            187 objectLocationDates-iterator.rq
-a---          2024-06-03    21:21           7708 objectLocations-generator.rq
-a---          2024-06-03    21:21            196 objectLocations-iterator.rq
-a---          2024-06-03    21:21           5997 objects-generator.rq
-a---          2024-06-03    21:21            181 objects-iterator.rq
-a---          2024-06-03    21:21           3885 sites-generator.rq
-a---          2024-06-03    21:21            161 sites-iterator.rq

PS C:\Users\woute\Git\ld-workbench>  npm run ld-workbench -- --configDir ..\ld-workbench-configuration\ao

> @netwerk-digitaal-erfgoed/ld-workbench@0.0.0-development ld-workbench
> node build/main --configDir ..\ld-workbench-configuration\ao

Welcome to LD Workbench version 0.0.0-development
Error in pipeline AO
Error in the iterator of stage `Events`: File not found: static/ao/events-iterator.rq
Error in pipeline AO
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "Error in pipeline AO".] {
  code: 'ERR_UNHANDLED_REJECTION'
}

Node.js v20.14.0
mightymax commented 3 months ago

Thanks, can you also share the paths you use in config.yml? They should have no path components in your case just the file name. If I remembered correctly, paths are taken from the path relative to config.yml, so in your case the filename should work. If not, we should make it work like that.

ddeboer commented 3 months ago

Currently it doesn’t work like that. I have a fix coming up.

github-actions[bot] commented 3 months ago

:tada: This issue has been resolved in version 2.0.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

wouterbeek commented 3 months ago

@ddeboer For me this does not work yet in 2.0.0:

PS C:\Users\woute\Git\ld-workbench> npm run ld-workbench -- --configDir ..\ld-workbench-configuration\ao

> @netwerk-digitaal-erfgoed/ld-workbench@0.0.0-development ld-workbench
> node build/main --configDir ..\ld-workbench-configuration\ao

Welcome to LD Workbench version 0.0.0-development
Error in pipeline AO
Error in the iterator of stage `Events`: File not found: C:\Users\woute\Git\ld-workbench-configuration\ao\static\ao\events-iterator.rq
Error in pipeline AO
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "Error in pipeline AO".] {
  code: 'ERR_UNHANDLED_REJECTION'
}

Node.js v20.14.0

Notice that the LD Workbench version that is printed is purportedly 0.0.0. Maybe there is something else wrong in my setup

ddeboer commented 3 months ago

Have you pulled the latest version of the config repository? We now use simpler file paths there.

wouterbeek commented 3 months ago

Thanks @ddeboer , pulling the config repo indeed fixed the situation for me. Thanks for fixing this!

For completeness' sake, this is the command that I use to run the AO ETL:

npx @netwerk-digitaal-erfgoed/ld-workbench@latest -c ..\ld-workbench-configuration\ao