sbmlteam / libCombine

a C++ library for working with the COMBINE Archive format
BSD 2-Clause "Simplified" License
8 stars 5 forks source link

BUG: libCOMBINE does not save metadata when only some meta attributes are set #43

Closed jonrkarr closed 2 years ago

jonrkarr commented 3 years ago

The following illustrates the issue:

import datetime
import dateutil.tz
import libcombine
import os
import zipfile

location = 'README.md'
filename = os.path.abspath(location)
format = 'http://purl.org/NET/mediatypes/text/plain'
master = False
updated = datetime.datetime(2020, 1, 1, 1, 1, 1, tzinfo=dateutil.tz.tzutc())
archive_filename = 'archive.omex'

# create archive without only updated date
archive = libcombine.CombineArchive()
archive.addFile(
    filename,
    location,
    format,
    master)
desc = libcombine.OmexDescription()
desc.setAbout(location)
date = libcombine.Date()
date.setDateAsString(updated.strftime('%Y-%m-%dT%H:%M:%SZ'))
desc.getModified().append(date)
archive.addMetadata(location, desc)
archive.writeToFile(archive_filename)

# metadata isn't saved
with zipfile.ZipFile(archive_filename, 'r') as zip_file:
    assert 'metadata.rdf' not in zip_file.namelist()

# create save archive, now also with a description
archive = libcombine.CombineArchive()
archive.addFile(
    filename,
    location,
    format,
    master)
desc = libcombine.OmexDescription()
desc.setAbout(location)
desc.setDescription('Description')
date = libcombine.Date()
date.setDateAsString(updated.strftime('%Y-%m-%dT%H:%M:%SZ'))
desc.getModified().append(date)
archive.addMetadata(location, desc)
archive.writeToFile(archive_filename)

# metadata is saved
with zipfile.ZipFile(archive_filename, 'r') as zip_file:
    assert 'metadata.rdf' in zip_file.namelist()
fbergmann commented 3 years ago

Support for the metadata format is limited to the format outlined in the combine archive specification. And there the creation date is indeed one of the important attribute (basically cause the metadata element is just written as raw string). You can check whether it is valid by specifying desc.isEmpty(). If it is considered empty, the element will not be serialized out.

jonrkarr commented 3 years ago

From a software perspective, it's surprising to set an attribute and then have to check this special function to see if the setting was ignored. An exception that indicates the missing required metadata would make more sense to me, or a function for getting all errors/ warnings about an archive.

fbergmann commented 3 years ago

i can see that. I'm hesitant to throw exceptions, but i can change the API, such that addMetadataToArchive would return a status code, indicating that the operation failed. With LIBCOMBINE_OPERATION_SUCCESS being returned in case of success and LIBCOMBINE_OPERATION_FAILED in case there is missing information on the metadata element.

jonrkarr commented 3 years ago

change the API, such that addMetadataToArchive would return a status code, indicating that the operation failed

That would be better.

I'm happy to follow whatever the specifications for the format are. Where does it say which attributes are required/optional?

fbergmann commented 3 years ago

That is implemented now, and as far as the spec is concerned, it is very clear, that all kinds of metadata are allowed for use in combine archives. It is only in trying to implement support for the very simple format depicted in the specification, that writing it out without: location, description, date seemed wrong.

According to the spec any other metadata format is fine as well, as any files can be added to the archive. It is just that the basic classes implemented here wont be able to parse / write that. I only try to read it if it is the namespace we listed in the spec (and unfortunately that one is now used for a different format too), so the additional flag is there to stop all meta data processing while reading.

jonrkarr commented 3 years ago

I'm just trying to make sure that whatever metadata we encoded using libCOMBINE can be read out by libCOMBINE (i.e. no information is lost). I'm happy to use specific triples and/or raise errors. I want to avoid information being lost without any warning/error to the user.

Is this the spec: http://co.mbine.org/specifications/omex.version-1.pdf? I'm looking at page 9. I see the outlined triples, but which are required/optional isn't clear to me.

In any case, now that I now about isEmpty, I can use it to achieve what I'm looking for.