r-geoflow / geoflow

Tools to Orchestrate Geospatial (Meta)Data Management Workflows and Manage FAIR Services
https://github.com/r-geoflow/geoflow/wiki
Other
41 stars 14 forks source link

specifying multiple processors in provenance column #303

Closed juliepierson closed 1 year ago

juliepierson commented 1 year ago

Hi, I'm trying to add a process with 2 processors in the provenance column. I could not validate my metadata with one process and 2 processors with geoflow, so I tried by duplicating the process and adding 2 processors, so that process and processor count match (hope I'm clear here !). geoflow ran ok, but in the metadata that was created in geonetwork, one process did not have any creator.

If it's possible to do this, maybe the best way to solve this would be to allow multiple processors for one process ? If for example 4 processors participated in one process, the current way would be to create 4 identical processes, which is a bit repetitive in the final metadata. But I understand this would mean a change in the way it's working now, to keep the correspondance between process and processors.

If its' not currently possible to change this, there still may be a bug since a process with no processor gets created by specifying 2 process and 2 processors with my data.

Thanks for your help !

eblondel commented 1 year ago

@juliepierson thanks for reporting This one may need some brainstorming how we should better implement the process/lineage through tabular metadata. Indeed there is a limitation for now that forces users to specify 1 processor for each process. It's a constraint

What i can think immediatly is:

Processors (in "Provenance", or "Creator" columns) would look like this:

processor1:me_
processor1:him_
processor2:me_
processor2:her_
processor3:theothers

which would match the following processes (not sure if we need to put indexes here as well - maybe):

process:process1_
process:process2_
process:process3

I cc to @juldebar i know he had given some feedback in the past, because struggling with the same syntax as you.

juliepierson commented 1 year ago

Thanks @eblondel, that looks like a good way to solve this problem !

eblondel commented 1 year ago

@juliepierson i've implemented it

eblondel commented 1 year ago

@juldebar @wheintz @mrouan please see above changes for processes definition to overcome the limitation of defining one processor by process. Processors should be now defined as part of the contact column.

eblondel commented 1 year ago

BTW, in the context of #298 this will include an additional refactoring for processes definition. In principle, we will need to define number of process (as for processors).

juliepierson commented 1 year ago

Thanks @eblondel , will test it soon !

juliepierson commented 1 year ago

Works ok for me, specifying processors in "Creator" column :smile:

eblondel commented 1 year ago

Great