gwu-libraries / batch-loader

Application for batch loading GW ScholarSpace
MIT License
1 stars 4 forks source link

Consider always making metadata fields a list #1

Open kerchner opened 6 years ago

kerchner commented 6 years ago

From reading the code, I see that each metadata field is written out as a key:value pair where the value may either be an atomic value, or a list, depending on whether the column name(s) in the CSV is numeric or not.

The result of this is variability in the structure of metadata.json based on the structure of the CSV. This can be dealt with in three ways that I can think of:

  1. Make the ingest task (on the repository app side) adaptable, where it expects either single values or lists.
  2. Have the batch loader always write values as lists, even if they may be 1-item lists (i.e. key: [value] as opposed to key: value
  3. Always use numbers in the CSV column names (e.g. title1 even when there is no title2). This puts a burden on the creator of the CSV to understand and remember.

@justinlittman your thoughts?

justinlittman commented 6 years ago

The intended approach was #3. This way the batch loader doesn't have to have any knowledge about the cardinality of the field -- it would all be driven by the CSV. Yes, this does require that the CSV creator understand which fields GWSS expects to have what cardinality.

Of course, happy to defer to your judgement as I was making assumptions about GWSS.

kerchner commented 6 years ago

Thanks. I suppose as long as we have and re-use a CSV template with column names meant for use with GWSS, then that should work. Creating metadata.json for some other system would just use a different template.