Open kaplun opened 8 years ago
... in other words, do we want to support MARC21 records containing "empty fields" such as:
<datafield tag="123" ind1="4" ind2="5">
</datafield>
and "empty subfields" such as:
<datafield tag="123" ind1="4" ind2="5">
<subfield code="a">Foo</subfield>
<subfield code="b"></subfield>
</datafield>
or do we want to always remove these empty fields/subfields?
CC @aw-bib @martinkoehler @fjorba @jma @basaglia
CC @inveniosoftware/triagers
Just crosschecked with our librarians to be sure not to miss esotheric cases:
As for TINDs comment: our librarians confirmed that e.g. Aleph allows to load empty fields/subfields on ingestion of external data. (I.e. bibupload
on the shell.) However, Alephs bibedit
would remove any of these fields silently and automatically once a cataloguer opens and stores such a record. That is, even if you deliberately add an empty field/subfield in Alephs bibedit
you can not save it. Thus, you can not rely on the fact that an empty field is preserved in this commercial system, simply as soon as a cataloguer touches such a record these fields get stripped. (IMHO Aleph is at least inconsistent here. With a tendency to strip.)
OK. Given the above and:
https://github.com/inveniosoftware/dojson/pull/155#issuecomment-232406188
I guess this we can re-consider the original #155 PR for inclusion (of course re-based)?
Then we should have a specific filter_values
decorator just for MARC21. Or simply add new filter for command line that removes empty values.
Such as the general one we are using in INSPIRE? https://github.com/inspirehep/inspire-next/blob/master/inspirehep/dojson/utils/__init__.py#L245
Yes, I think we can close this RFC to say that empty values in fields/subfields should be "tolerated" on the input upload side, but that we can delete them internally as soon as we spot them.
Problem
Currently,
utils.filter_values()
is filtering away keys and corresponding values from dictionaries wherevalue is None
.This concretely means, e.g. in the context of MARC21 conversion to JSON, that subfields with empty strings would be preserved, datafields with no subfields would be preserved.
Proposal
If we assume that an empty string in the bibliographic metadata context doesn't carry any valuable information, it is proposed that
filter_values
actually filters away any key whose value is:False
False
value itself (thus representing flag set to false) or the 0 numberUsecases
According to TIND, @Kennethhole reports:
Related to INSPIRE, I can confirm that we have no use for empty values and we internally went further and have implemented a function that recursive visit the whole record and strips away also empty list and empty dicts that result from having filtered values. https://github.com/inspirehep/inspire-next/blob/master/inspirehep/dojson/utils/__init__.py#L206
See also: