metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

Bug `encode-csv` with two value csv #494

Open TobiasNx opened 11 months ago

TobiasNx commented 11 months ago

In my example here: https://github.com/TobiasNx/metafacture_workflows/commit/16308bc44ab961f3beaeef0497479e1124aedc09

The outputted csv seems to have sometimes mixed up the columns. This seems to be due to order of the incoming stream:

Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft    http://lobid.org/organisations/DE-951#!
http://lobid.org/organisations/DE-1a#!  Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße
Hochschularchiv der ETH Zürich  http://lobid.org/organisations/CH-001807-7#!
Heimatgeschichtliches Museum Modautal   http://lobid.org/organisations/DE-MUS-265910#!
Museum Johannes Reuchlin MJR    http://lobid.org/organisations/DE-MUS-492617#!

If I output the json, the issue seem to be created by a variation in the output order:

{
  "name" : "früher: Frankfurt/Main; Institut für Rechtsgeschichte, Bibliothek",
  "id" : "http://lobid.org/organisations/DE-30-163#!"
}
{
  "name" : "Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft",
  "id" : "http://lobid.org/organisations/DE-951#!"
}
{
  "id" : "http://lobid.org/organisations/DE-1a#!",
  "name" : "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße"
}
{
  "name" : "Hochschularchiv der ETH Zürich",
  "id" : "http://lobid.org/organisations/CH-001807-7#!"
}
{
  "name" : "Heimatgeschichtliches Museum Modautal",
  "id" : "http://lobid.org/organisations/DE-MUS-265910#!"
}
blackwinter commented 11 months ago

Currently, the CSV encoder writes literals (values) as they come in, without giving any regard to their names. Hence, if the input order is unstable, the output will be inconsistent.

A potential solution might be to write values in the order they were first received, which is also the order of the column headers. But this will get somewhat complicated when also taking repeated fields into account.

TobiasNx commented 9 months ago

Task: map incoming data to header order, add new row in header, if element does not exist.