internetarchive / openlibrary-librarians

Coordination between the OpenLibrary.org Librarian community
17 stars 3 forks source link

A number of questions, mostly about translations #35

Closed MatthiasWinkelmann closed 3 years ago

MatthiasWinkelmann commented 4 years ago

I've been using OL quite extensively for some research, and in a fit of possibly lockdown-related ADD have started to correct errors along the way. A few questions have cropped up that I could not find answers to. Please feel free to point me at any document I may have overlooked.

The following are observations/suggestions more than questions:

seabelis commented 4 years ago

I have seen both the original-language title being the title of the work as well as the title of the English translation being used. Is there a preferred policy?

Ideally, there should be one record for each work; in most cases, the work-level title should be the original title in the original language. Edition records should reflect the actual titles of their respective editions.

seabelis commented 4 years ago

Many German books include "Roman" as a sort of subtitle, equivalent to the English "a novel". Should these be regarded as subtitles?

This should not be used at the work-level unless it's needed to differentiate between another work with the same title (i.e. sometimes authors have short stories and novels with the same titles). It's fine to use at the edition-level, but not necessary.

seabelis commented 4 years ago

Subjects and, on occasion, other metadata such as pagination is sometimes English and sometimes in the work's or edition's language. Which is preferred here?

It is okay to use subject tags in languages other than English as these are not translated when viewing the site in other languages. Work-level descriptions should be in the work's original language as translated editions can have descriptions in their respective languages at the edition-level. Edition-level metadata can be in the edition's language or in English. The edition's original language is preferred. If there is something notable about a given edition, it could also be useful to include this in English in the notes field as this is generally information for librarians rather than patrons.

seabelis commented 4 years ago

Some language confusion regarding place names and "published in" metadata: Should these be recorded as "Milan, Italy", "Milano, Italia" (or, for people who like to see the world burn, maybe "Milan, Italia")

The language of the edition is preferred, but English is also fine.

seabelis commented 4 years ago

On that topic: US place names typically include the two-letter state abbreviation ("New York, NY"). Sometimes this is necessary to distinguish between the, for example, 43 different Springfields, so I believe changing these to just "Springfield, USA" as the instructions on the form field suggest is unadvisable. I feel like "New York, NY" is probably universally understood to imply "USA". But at the same time, that assumption seems slightly arrogant and maybe US-centric. So: should it be "New York, NY, USA" or "New York, NY"?

I use "New York, NY, USA" format for U.S. locations. I see no harm in stating the obvious.

seabelis commented 4 years ago

There are some metadata fields where certain abbreviations are common, such as "1st ed.". Since these are just text fields, I guess it doesn't make much of a difference. But assume that I have absolutely no personal preference and want to add that information, which should I choose: "First edition", "1st ed.", or "1st edition"?

There isn't really a rule, but if there is a number involved, I usually use the digit rather than the text. My thoughts are that if someone ever wants to use this for sorting, the number will work better than the text format. To abbreviate or not is up to you.

seabelis commented 4 years ago

There seems to be a "genre" field for editions. This doesn't seem to be available in the edit form?

Sometimes data is imported with a record that does not correspond with a field on the edition edit form. A genre should be applicable to the entire work and it is appropriate to add it as a tag to the work's subjects field.

seabelis commented 4 years ago

I've recently deleted "Referral IDs" from a few dozen links to Amazon.

Can you give me an example? Do you mean deleting the amazon id from identifiers?

seabelis commented 4 years ago

For translations, it is common practice in publishing to include a note such as "Translated from American English by ...". The dropdown under "languages", however, only offers English. Since the distinction seems to matter to the people doing the work, it might be advisable to respect that by adding appropriate entries. Besides American English, I remember seeing "South African English" and "Brazilian Portuguese", but the list is likely to be longer.

I suggest opening a feature request issue in the main repository as this would require changes to the code. https://github.com/internetarchive/openlibrary

seabelis commented 4 years ago

There's an open issue in this repository to report duplicate works. I have in the past used the "Link" metadata on duplicate works to link to the canonical work, thinking that it should be possible to programmatically identify and resolve such links in the future. So I want to suggest possibly advising users of such a workflow, as it seems slightly more streamlined than asking for and manually working through individual reports here on GitHub.

We are currently working to improve our dupes reporting. This repo however is more appropriate for reporting the data issues themselves. Suggestions that require changes or improvements to functionality should be added to the main repo. https://github.com/internetarchive/openlibrary

seabelis commented 4 years ago

https://github.com/internetarchive/openlibrary/wiki/Library-Metadata-Standards advises sentence case for titles, using the somewhat confusing argument that it is easier to convert from there to title case than in the other direction. I've tried to follow that instructions, but possibly due to language rules being considered somewhat more binding in my native language the part of my brain that likes to follow rules in getting rather confused.

Case used should be according to the given language and culture. English typically capitalizes each word but many languages capitalize only the first word and proper nouns.

seabelis commented 4 years ago

If you've not already discovered https://openlibrary.org/help/faq/editing, you may want to take a look. It is a work in progress, so if you'd like to be invited to the slack channel where it's a bit easier to ask questions, please send your email address to me at seabelis74@gmaill.com.

seabelis commented 4 years ago

@MatthiasWinkelmann Have all your questions been answered? May I close this issue?