openaire / iis

Information Inference Service of the OpenAIRE system
Apache License 2.0
20 stars 11 forks source link

Remove avro.schema.input.key redundant properties from map-reduce actions #1415

Closed marekhorst closed 1 year ago

marekhorst commented 1 year ago

While refactoring the iis-wf-export-actionmanager code in #1406 it turned out one of avro.schema.input.key properties in sequencefile workflow.xml file was improperly set in one of the previous commits thus being ineffective:

https://github.com/openaire/iis/blob/184a5154d271f8c9589618978ba47d2b2cf50856/iis-wf/iis-wf-export-actionmanager/src/main/resources/eu/dnetlib/iis/wf/export/actionmanager/sequencefile/oozie_app/workflow.xml#L492C75-L492C75

eu.dnetlib.iis.referenceextraction.dataset.schemas.DocumentToDatasource schema was not handled by the generate-schema action prior to exporter-document-to-datasource execution.

Even though integration test did not fail proving this property is redundant and can be simply removed.

After identifying this issue and removing avro.schema.input.key propertly completely from the exporter actions the integration test did not fail proving once again this property does not need to be defined.

We should drop all the avro.schema.input.key properties from all exporter actions along with the generate-schema action which also is not required anymore.

We should also check other map-reduce cases in other modules involving the generate-schema pattern.

marekhorst commented 1 year ago

avro.schema.input.key was definitely required for map-reduce actions at some point in the past but apparently this is not the case anymore.