sbmlteam / libCombine

a C++ library for working with the COMBINE Archive format
BSD 2-Clause "Simplified" License
8 stars 5 forks source link

Metadata not read if rdf:about and omex location not identical #5

Closed matthiaskoenig closed 7 years ago

matthiaskoenig commented 7 years ago

Hi all, currently the metadata from the CombineArchiveShowcase cannot be read, because the leading . is missing in the rdf:about.

entry: <1> <libcombine.CaContent; proxy of <Swig Object of type 'CaContent *' at 0x7f9a4dd0b570> >
 1: location: ./README.md format: http://purl.org/NET/mediatypes/text/x-markdown
no metadata for './README.md'

The problem is that the locations in the omex are given as:

<content location="./manifest.xml" format="http://identifiers.org/combine.specifications/omex-manifest"/>
<content location="./README.md" format="http://purl.org/NET/mediatypes/text/x-markdown"/>
...

whereas the rdf:abouts are

rdf:about="/README.md"
...

The metadata should be resolved with and without the leading ., especially because the Omex specification only states that these should be identical (no must).

A COMBINE archive can include multiple metadata elements adding information about different content files. To identify the file a metadata element refers to, the rdf:about attribute of the relevant metadata structure should use the same value as used in the location attribute of the respective Content element,

In my opinion a problem with the combineShowCaseArchive and reported it there.

But libcombine should be so flexible to handle the leading ., ./ and no leading character cases.

M

fbergmann commented 7 years ago

Bummer ... the specification should have said must rather than should in this case. It makes no sense to me otherwise. But I will try and alter it for the full release.

thanks for letting me know.

matthiaskoenig commented 7 years ago

Yes, should be must. In addition I think the metadata is broken in general

See example attached: archive_example.zip

Example only displays the single creator for . (there are multiple creators), does not show the modified and misses all metadata for the other entries.

Probably the hashmap for the metadata given locations is not populated correctly:

metadata for '.':
    Created : 2000-01-01T00:00:00Z
    Creators: 1
    Vasundra Toure
Num Entries: 20
0
entry: <0> <libcombine.CaContent; proxy of <Swig Object of type 'CaContent *' at 0x7fa29bf664e0> >
 0: location: ./manifest.xml format: http://identifiers.org/combine.specifications/omex-manifest
no metadata for './manifest.xml'
1
entry: <1> <libcombine.CaContent; proxy of <Swig Object of type 'CaContent *' at 0x7fa29bf66570> >
 1: location: ./README.md format: http://purl.org/NET/mediatypes/text/x-markdown
no metadata for './README.md'
2
entry: <2> <libcombine.CaContent; proxy of <Swig Object of type 'CaContent *' at 0x7fa29bf664e0> >
 2: location: ./model/BIOMD0000000144.xml format: http://identifiers.org/combine.specifications/sbml.level-2.version-1
no metadata for './model/BIOMD0000000144.xml'
3
entry: <3> <libcombine.CaContent; proxy of <Swig Object of type 'CaContent *' at 0x7fa29bf66570> >
 3: location: ./model/calzone_2007.ai format: http://purl.org/NET/mediatypes/application/illustrator
no metadata for './model/calzone_2007.ai'
4
...
fbergmann commented 7 years ago

I will have a look ... when i'm using it i usually have one metadata file per file where i want to attach meta information. But it should have been able to read the rest. I will let you know what i find.

matthiaskoenig commented 7 years ago

After sleeping over it, I think the specification should be interpreted as 'must'. And libcombine should also handle it as must without workarounds. The only thing libcombine should do is to check if there are metadata.rdf entries which do not have a corresponding entry in the omex file and give a warning (error) of the form

WARNING: Metadata rdf:about='/README.md' does not have any entry 
in the OMEX manifest. rdf:about and OMEX manifest location must be identical.

Than users will know that there is a problem mapping and can fix the archives. Fix for CombineShowCase archive is done https://github.com/SemsProject/CombineArchiveShowCase/pull/6

matthiaskoenig commented 7 years ago

The original issue is solved. RDF and metadata locations have to be identical.