Embedded Metadata with Dublin Core vocabulary (RDF)

zotero / translators

Zotero Translators

http://www.zotero.org/support/dev/translators

1.28k stars 756 forks source link

Embedded Metadata with Dublin Core vocabulary (RDF) #718

Closed zuphilip closed 10 years ago

zuphilip commented 10 years ago

Can we add the types from SSOAR into the correct translator (cf. https://forums.zotero.org/discussion/36023/ssoar-translator )? I guess this is similar to https://github.com/zotero/translators/blob/master/RDF.js#L389 ff.

SSOAR seems to use the 33 types (see http://www.ssoar.info/ssoar/search-filter?field=documentType&offset=0 ff), but most entries have several DC.type (English, German but also more specific classes like company report). It should be enough to handle the following cases:

<meta xml:lang="de" content="monograph" name="DC.type">
<meta xml:lang="de" content="article" name="DC.type">
<meta xml:lang="de" content="collection" name="DC.type">
<meta xml:lang="de" content="incollection" name="DC.type">
<meta xml:lang="de" content="recension" name="DC.type">

It seems that these classes appear normally as the first meta tag for DC.type, but maybe there are also cases where one has to check the other tags as well.

adam3smith commented 10 years ago

I'm generally OK with that, but if we could find out where these are from that'd be great. As you may have seen in the RDF translator, we try to link to the description of vocabulary we're using. I'd say we go ahead even if we can't find documentation, but worth a try.

aurimasv commented 10 years ago

If I understand correctly, this would go in the SSOAR translator?

adam3smith commented 10 years ago

no, I thought we'd put this right into RDF?

aurimasv commented 10 years ago

We can support the extra types (what would these resolve as?), but if there's any logic in regards to meta tag order, that would have to go into a dedicated translator.

adam3smith commented 10 years ago

(what would these resolve as?)

<meta xml:lang="de" content="monograph" name="DC.type">

book

<meta xml:lang="de" content="article" name="DC.type">

journalArticle

<meta xml:lang="de" content="collection" name="DC.type">

book(?) possibly with edited authors?

<meta xml:lang="de" content="incollection" name="DC.type">

bookSection

<meta xml:lang="de" content="recension" name="DC.type">

journalArticle(?)

Looking at these, they seem borrowed from BibTeX. I actually read the specs for DC.type yesterday: http://dublincore.org/documents/dcmi-type-vocabulary/#terms-type and they're super-vague:

Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]

so we can take be pretty flexible, too.

zuphilip commented 10 years ago

My idea was to update the metadata translator and in the same time maybe ask for an improvement of the metadata by the publisher (GESIS). Especially, since a DSpace translator was deleted at some time because the metadata are normally pretty good.

Now, I am not so sure anymore if this is realistic or if we need a seperate SSOAR translator in the end. If we need a seperate translator in the end (and this is my guess at the moment), then it might easier to start with the BibTeX data (and maybe ask there for improvements).

What do you think?

adam3smith commented 10 years ago

Before import is actually decent, the metadata would have to improve a lot whereas BibTeX is already pretty decent, so I think going the route of a dedicated translator is more promising. Dedicated translator also allows multiple import and as of now EM wouldn't attach PDFs, so we'd want a translator for that.

That said, I think asking them to improve the embedded metadata is worthwhile in any case - that also means better import into e.g. Mendeley as well as google scholar (from which Zotero in turn benefits via retrieve metadata). Maybe see if you get a response and make the final call then?

zuphilip commented 10 years ago

Added a SSOAR translator based on BibTeX. No more need to change the metadata/RDF tanslator.