Open dr0i opened 1 year ago
provide the result again as
bgzf
MARC-XML file
Just for the record: BGZF was chosen to allow for random access to individual records (in combination with an index file); if you don't intend to support this use case, you might want to choose a more common compression format.
You are right - we should test running time for creation of the file and the resulting file size and decide which format , e.g. tar.gz
(as with the MAB-XML dump) or tar.bz2
or tar.xz
- the latter should be best choosen, or what would you suggest @blackwinter ?
As this is a single file, I wouldn't create a tar archive. I would probably just go with gzip or bzip2 due to ubiquity.
We cannot just expose the Alma dump because since there are local (IZ) fields we have to suppress. Since https://github.com/hbz/lobid-resources/issues/1687 we have suppressed these so that in lobid-resources we have only Open Data. To provide an Open Data MARC-XML dump we have to filter these fields from the MARC-XML and provide the result again as
bgzf
MARC-XML file analog to https://lobid.org/download/dumps/DE-605/mabxml/ under https://lobid.org/download/dumps/DE-605/marcxml/ .See also https://github.com/hbz/lobid-resources/issues/1316.