relaton / relaton-models

Bibliographic models
4 stars 2 forks source link

Analyze ISO 690 and make model compatible #1

Closed ronaldtse closed 6 years ago

ronaldtse commented 6 years ago

As title.

opoudjis commented 6 years ago

I still need access to a copy of ISO 690

ronaldtse commented 6 years ago

Done

opoudjis commented 6 years ago

This is going to be a long list of suggested additions, that you'll need to go through one at a time. I will be giving clause references for each comment; please do likewise in your responses.

opoudjis commented 6 years ago

4.1.2. "Any information that does not appear in the cited information resource, but is supplied by the citer, should be enclosed in brackets."

We could put a flag for every field indicating its provenance. Let's not do that; let's leave the brackets in the data. So dates@created/from = [1984], not dates@created/from = 1984, dates@created/from@provenance = supplied. The latter is logical, but it's clutter, and it's clutter we'd need to repeat for every single element of the biblio.

... I can see you liking that notion. :-) But I don't think it's needed.

opoudjis commented 6 years ago

4.5 High level containers we are missing (will come back to these with modelling suggestions):

opoudjis commented 6 years ago

5.1. ContributorRoleTypes should be a taxonomy of the roles spelled out here (and put in the diagram as a note.) I think we should expand it to cover all possibilities fully:

Current: author, editor, cartographer, publisher.

Change to:

opoudjis commented 6 years ago

5.2.1 Personal names: life would be somewhat easier if we allowed a complete name as an alternative to the broken up name; breaking up names is not always expedient. So FullName = (forenames, initials, surname, additions, prefix) | completename; completename = string.

Cf. http://specification.sifassociation.org/Implementation/AU/3.4.3/CommonTypes.html#BaseNameType , which is from the schools data standard I work with in my day job: you will see FullName there. (In fact, "completename" in the bib model should be "fullname", and FullName should just be PersonName.)

FullName will also deal with the language-specific idiosyncrasies of how to order the name elements, which can be language-specific: where the "de" goes for example in "La Fontaine, Jean de" vs "De La Mare, Walter", whether given name precedes or follows the surname (and therefore whether a comma is needed between the two), etc. In fact, FullName = (forenames, initials, surname, additions, prefix) | completename | forenames, initials, surname, additions, prefix, completename makes more sense (we need at least one, if not both).

opoudjis commented 6 years ago

5.3.2. "To distinguish between different bodies with the same name, the appropriate place name should be added." TRINITY COLLEGE [Cambridge]. TRINITY COLLEGE [Dublin].

My consistent attitude towards these bracketings is that we should not separate them out into fields; they are a matter for the bibliography manager. Places can be extracted from the contact information of organisations, though you'd have to dig pretty deep (and we don't have much of a contact model right now.) I'd rather we just leave them as bracketed supplements.

opoudjis commented 6 years ago

5.3.3. "If the name of an organization implies subordination to a parent body of which it is an organ or administrative division, or if its full significance depends upon the inclusion of the name of the parent body, the latter should be given first in the reference." IMPERIAL CHEMICAL INDUSTRIES. Paints Division.

We partly deal with this already in ISO-specific customisation, by adding the technical committee as a distinct element. I think we can add an optional subdivision element to Organization. We should just leave it as a string, rather than try to model the internal reporting structure of the organisation.

opoudjis commented 6 years ago

5.5 Pseudonyms. e.g. BLAKE, Nicholas [pseud. of Cecil Day LEWIS].

We can model the real name of the author along with the pseudonym they published under; we could even have relations between PersonNames. Let's not, that's overkill. I'm inclined to leave the bracketed information, once again, as bracketed in the formatted complete name.

EDIT: change my mind. Let's treat this as a name note.

opoudjis commented 6 years ago

5.6. Anonymous

I don't see a point in modelling "anonymous" (or for that matter "varii auctores/ https://en.wikipedia.org/wiki/Various_authors") as a different type of name; just leave that as the CompleteName.

opoudjis commented 6 years ago

6.1.2, 6.1.5 Alternative titles (e.g. Eric, or Little by little: a tale of Roslyn School.; other titles (e.g. Children and their primary schools [Plowden Report].); subtitles (e.g. Etheldreda's Isle: a pictorial map of the Isle of Ely to commemorate the 1300th anniversary of the founding of Ely's conventual church.)

We already have multiple title entries, but we differentiate them only by language. We could differentiate titles by type as well, but we'd need a real vocabulary for that. If we go down that path, we should limit the vocabulary to "alternative, original, unofficial, subtitle". ("Unofficial" for the unofficial name a resource is more widely known as, e.g. https://en.wikipedia.org/wiki/Hansard, which is a traditional name rather than the official name of the parliamentary transcripts.)

I would not enforce breaking up titles anyway; if sources prefer to keep them in one title, with use of punctuation and brackets, they should.

opoudjis commented 6 years ago

6.1.6. Disambiguation of titles. (e.g. Statistical digest of the war. [1939–1945].)

That's an editorial intervention in the title done by the bibliography manager. As with all bracketed interventions, I don't believe they should be extracted out into a separate field.

opoudjis commented 6 years ago

6.1.7. No title

Bracketed popular/convenience titles shouldn't be treated separately. "Untitled" shouldn't be treated separately.

opoudjis commented 6 years ago

6.2. Translation of title

This actually is not handled by language strings, since that does not indicate which language is the original language of the title. These should instead be encoded as title@type = original. So The Artamonovs [Delo Artamonvykh]. should be encoded as

<title type="original" lang="ru">Delo Artamonvykh</title>
<title lang="en">The Artamonovs</title>

and it should still be legal to use instead:

<title>The Artamonovs [Delo Artamonvykh]</title>
opoudjis commented 6 years ago

I will resume this later, there is still a lot to go.

opoudjis commented 6 years ago

6.3. Serials

Add series element to BibliographicItem as single FormattedString.

6.3.1.

Disambiguating qualifiers (organisation publishing series, place of publication) are italicised and bracketed differently. My default position is that they should not be separated from the series; but the italics makes me think perhaps they should; in which case, series/title, series/place, series/organization.

The problem with that is, the disambiguating qualifiers should only be added at the discretion of the bibliography manager; if they don't serve a disambiguating purpose, they should be left out, rather than being obligatorily populated. This may justify adding a series/formattedtitle, just as with full name, where the italics, brackets and disambiguation have all been handled already.

opoudjis commented 6 years ago

6.3.3. Earlier or later titles. 6.3.4. Abbreviation

You don't actually see these in references, just in library catalogues, but these would be:

series: SeriesType
altseries: SeriesType[0..*]

SeriesType:
title: FormattedString
abbrev: FormattedString[0..1]
place: String[0..1]
organization: String[0..1] # not: OrganizationType, that's overkill
dateFrom: xs:date[0..1]
dateTo: xs:date[0..1]

EDIT: for consistency with titles, will instead type Series

SeriesType:
Type: {main|alt}
...
opoudjis commented 6 years ago

6.4. Citation of contribution to host item (e.g. article in monograph)

The clean way to do these conceptually is as a DocumentRelation, with locality, and with DocumentRelationType of "part" or "included in". One could object that this is not expedient, and the including item's attributes should just be part of the BibItem; a lot of bibliographic managers do that. Yet there are bibliographic conventions which cite the host item separately, and that would only work by keeping the host item as a separate item.

So:

MICHAEL, D. The effect of local deformations on the elastic interaction of cross walls coupled by beams. In: COULL, E.A. and B. STAFFORD-SMITH. Tall buildings. Oxford: Pergamon Press, 1967.

The simple thing to do is to have two bib items, Michael's article (with page numbers!), and Coull & Stafford-Smith's volume, related through a part-whole relation. There are some bibliographic conventions which would simply cite this as:

MICHAEL, D. The effect of local deformations on the elastic interaction of cross walls coupled by beams. In: COULL, E.A. and B. STAFFORD-SMITH (1967).
COULL, E.A. and B. STAFFORD-SMITH. Tall buildings. Oxford: Pergamon Press, 1967.

And in case they do that, "Tall buildings" should not be an attribute of the bib item for Michael's article. This does introduce a little processing complexity; but there can be an arbitrary number of features of the host item introduced in the part item citation; replicating the attributes in the part bibitem seems retrograde.

opoudjis commented 6 years ago

6.5.

Conference date and place could be additional attributes, but they would be idiosyncratic to the conference type of reference, and once again, I'd rather they just be inline in the conference title.

opoudjis commented 6 years ago
  1. Medium

Add medium as attribute of BibItem. Just String, no need for brackets.

opoudjis commented 6 years ago

8.1.

Edition should remain a string; disambiguating information such as country of publication should be left in brackets. Edition can include software version.

8.2.

Date of update is a bibdate. Date of access is a bibdate.

opoudjis commented 6 years ago

9.1. Place

It makes more sense to me to make the place of publication a separate attribute (String[0..*]), than an attribute of the publisher contributor. (Do we really want to be fishing it out of contact addresses?) If you want to make it an attribute of the publisher contributor, we should just add place to Organization as String[0..*].

9.1.1.

Disambiguating attributes (e.g. Geographical regions) should be left in brackets in the place.

9.1.2.

Order is meaningful in places of publication, with the first place of publication used for citation in ISO 690. If you don't trust ordering in XML, you can add a @main=true attribute, but I don't think that's necessary.

(That also applies for multiple publishers: 9.2.2)

opoudjis commented 6 years ago

9.2.1 Publisher types

We already faced this issue in GB, but we have here a vocabulary for publisher types: publisher, distributor, printer, sponsor.

opoudjis commented 6 years ago

9.3.1

To bibdate types, add "transmitted" (although you could argue that is "published" for audiovisual).

I would not put the broadcaster in the bibdate for audiovisual, of course. That's a contributor, of type "publisher" and role "broadcaster".

To bibdate types, we should also add "copyright". We have modelled it as a separate attribute in ISO, but it is a generic distinct bibdate.

EDIT: rescinded last para.

opoudjis commented 6 years ago

9.3.2, 9.3.4

Bibdates by default are xs:dates, but we need to allow for alternate calendars ("Jewish calendar 5685 [1925]"), and corrections ("1959 [1995]"), and approximations, and undated ("ca. 1750, 16th century, no date").

That means that bibdate is not of type xs:date, but xs:date | string. Rather than break the existing model, I'd rather we put imprecise dates into a @text attribute, and leave the body blank—with the understanding that @text takes priority over any Gregorian date in the element value. So:

<from>1925</from>
<from text="Jewish calendar 5685 [1925]">1925</from>
<from text="1959 [i.e. 1995]">1959</from>
<from text="ca. 1750"/>
<from text="16th century"/>
<from text="no date"/>

The alternative is to break up dates into a date element and a text element; or to give up and make the element xs:date | string.

opoudjis commented 6 years ago

9.3.5 Multiple dates

If a resource is republished ("reprint, facsimile or other copy"), that is logically two bibdates; so "1796 copied 1810" should be rendered based on two bibdates, 1796 for created (first published), and 1810 for issued (or copied or republished).

opoudjis commented 6 years ago

9.3.6. Range of dates: e.g. "1970-1973, vols 1-3. Discontinued".

The localities corresponding to the range of dates are of course modelled separately.

opoudjis commented 6 years ago

10.2

The recommendation is that the locality "whole" is represented by giving the size of the locality; e.g. "7pp". I don't think that's a locality attribute at all; the size of a bib item should be given as a separate attribute in the bibitem, "extent":

extent: ExtentType[0..*]
ExtentType: 
type: SpecificLocalityType
content: string

e.g. "xviii+13pp+2 plates" (because prefatory material is numbered separately) would be:

<extent type="page">xviii</extent><extent type="page">13</type><extent type="locality:plate">2</type>

opoudjis commented 6 years ago

11 Series title and number

I would much rather model journal volumes and issues as the series number than as a locality, as suggested in 10.3. ISO 690 already conflates series title and journal title anyway.

Series number is a distinct attribute of bib item. Series number can include volume + part; I'd suggest having two elements for series number. So series becomes:

series: SeriesType
altseries: SeriesType[0..*]

SeriesType:
title: FormattedString
abbrev: FormattedString[0..1]
place: String[0..1]
organization: String[0..1] # not: OrganizationType, that's overkill
dateFrom: xs:date[0..1]
dateTo: xs:date[0..1]
number: String[0..1]
partnumber: String[0..1]

EDIT: rescinded: will keep journal volume and issue in extent (as locality), logically the conflation is forced. But it could go either way.

opoudjis commented 6 years ago
  1. Location

The archival location for an item should be a distinct optional string attribute. So:

GOSSE, Sylvia (1881–1968). The Garden, Rowlandson House [etching and aquatint, 1912]. At: London: British Museum, Department of Prints and Drawings. Register number 1915-27-41.

Everything after "London" becomes the value of an (archival) location element. Because we already use locality elsewhere, let's call this "access_location".

opoudjis commented 6 years ago

14.2 Classification

Classifications do not uniquely identify a document the way document identifiers do, so they should not be conflated; but they should be modelled in the same way, as id + type. The patent classification given "Int. CI. E02F 3/76. GB CI. E1F 12." is of course three different classifications.

opoudjis commented 6 years ago

14.3 Size

We should resist the temptation to break the size up into width and height and units. (It's an informational attribute, noone's ever going to extract the numbers and use them for sorting; and "A5 landscape" is going to be routinely jammed into the size element—leave alone the complications of cartographic material: see 15.5.6.) Just leave it as a string.

opoudjis commented 6 years ago

14.4 Price and availability

I think this should become a note, without further markup.

opoudjis commented 6 years ago

14.5

We need to differentiate the language and script of the bibliographic item from the original language and script, in case the item is a translation. Translations do not need to become bibrelations, since very little bibliographic information about the original is included in citing a translation (just the author, the source title, and maybe the original date—all of which bibdata will cope with, as author vs translator, title@type=original, and date@created vs date@published (the original language version has a first published date, the translation has the current published date).

EDIT: rescinded languages: if included, they are modelled via a bibrelation

opoudjis commented 6 years ago

15.2.3 System requirements are just a note.

15.2.4 Edition wording will be more flexible for software.

15.2.5 We already provide for date@accessed

15.2.6 The availability location of an electronic resource is actually the same thing as the archival location of an unpublished physical resource, as seen in 13. However, actionable URIs should be isolated as URIs. So: we keep link for availability locations which are a single URI; but anything else—including "Available from Internet via anonymous FTP from: BORG.LIB.VT.EDU", "Available from: MedlinePlus", and "Available from: http://www.culturekiosque.com/art/comment/damien_hirst.html Path: Home; Art; The Death of God: Damien Hirst", should be put in "access_location".

opoudjis commented 6 years ago

15.2.7 "Also available" can be other URIs. I pause over the example:

"Also available in PDF from: http://www.ukoln.ac.uk/services/elib/papers/other/pinfield-elib/elibreport.pdf"

This actually indicates a distinct medium as well as location, but I think we should just leave this as a typed URI.

opoudjis commented 6 years ago

15.4 Audiovisual material

"For audiovisual material, sufficient information should be given about the format of the item to identify the requirements for its playback, e.g. DVD, 16 mm film, MPEG-4."

This is just medium information.

opoudjis commented 6 years ago

15.5.1 Cartographic material

"The projection, prime meridian, orientation and reference systems such as grids and navigational lattices may be given if considered important."

This goes into a note.

15.5.5 Scale

I'm not happy about this, but I think this is a top-level attribute of bibitem; it should not be buried in a note, because it appears to be cited for any map.

15.5.7 Spectral information/cloud cover

This information, on the other hand, clearly belongs in a note.

opoudjis commented 6 years ago

15.6.1

The fact that titles don't get cited with an author, but e.g. "Macbeth [film]. Directed by Orson WELLES.", can be dealt with through the fact that the director is of type "performer" (b), and there is no "author" type contributor (a). And (as the description of 15.6.2 states) if you want to override that, you just make the director a creator, and their name will appear before the title.

opoudjis commented 6 years ago

15.6.4 Programme within a series.

The series is modelled through series title; the episode is the item title. So:

Yes, Prime Minister, Episode 1, The Ministerial Broadcast. BBC 2. 16 Jan. 1986.

Series title: Yes, Prime Minister. Item title: The Ministerial Broadcast Series number: Episode 1 (so series numbers are in fact strings, and I'm not eager to break them up into type and number)

There's the rendering complication that for TV programmes, the series comes first.

15.6.5 Contributions.

THATCHER, Margaret. Interview. In: Six O'Clock News. BBC 1, 29 Jan. 1986, 18:23.

This is a problem for the model: there's now three potential levels, the contribution, the item, and the series. We have only modelled two levels with series; the only out is what we've already done for 6.4, model it like an article in a volume (which could be in a series). In fact, that's what they're doing in ISO 690 anyway, since they use the same "In:" convention.

opoudjis commented 6 years ago

15.7 Graphic works

15.7.1 Artists are class (a) contributors, authors; nothing is preventing using Latin terms in the contributor@role, like "pinxit" instead of "painter". (I note with some amusement that the first example has "fecit" for the primary creator, and "fecit" is just "he did it", which really is a generic "creator".

opoudjis commented 6 years ago

15.8.1

Librettists and Composers are both class (a) contributors, so there needs to be a rule in rendering to prioritise composers over librettists.

15.8.2

Pocket score vs miniature score vs study score is strictly speaking a publishing format rather than a size, which would go into Medium. But leaving it in Size is expedient.

opoudjis commented 6 years ago

15.9 Patents

The country code is simply part of the series name for patents.

If it is not already clear in the reference, the fact that an item is a patent should be stated.

BibItemType needs to do work here.

Date of application for a patent is date@created (or maybe date@confirmed); I'm not eager to create a distinct date@lodged or date@applied.

opoudjis commented 6 years ago

I'm transferring all this into https://github.com/riboseinc/iso690xml . Aligning this bib-models to iso690xml is going to be a much longer term project; I want to keep these two separate, sice ISO690XML is speculative.