Closed jacobthill closed 6 months ago
Note, I found this https://github.com/sul-dlss/dlme-transform/blob/b434ee43245c14ab5140a1b162ee7d2ae14fc55d/lib/macros/field_extraction.rb#L27 which I think we used in the past as a work around but this is not a great solution because we are automatically refreshing data so a provider could change a list into a string at any time and vice versa.
Currently, in order to get all values from each json field, we have to make a separate extract_json call at each index. I need 31 calls on one field and there is no way to be sure we are getting all of the data without building an airflow task to check the maximum length of each field coming out of the harvest task and checking that against the number of calls we make in the traject config. It seems like a better approach would be to update extract_json so it expects a list and plays nice with the other macros that we call afterwards e.g. strip, etc.
extract_json also cannot accept integers.
Exception: NoMethodError: undefined method `empty?' for 1:Integer /opt/traject/lib/macros/each_record.rb:35:in `reject'
e.g.
to_field 'cho_contributor', extract_json('.contributor'), strip, unique, arabic_script_lang_or_default('und-Arab', 'en')
should get all values from the contributor field.Here is a json records with many values in the contributor field to test on:
Requirements:
extract_json
call.