internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.08k stars 1.32k forks source link

Keep language standardized to English #2601

Closed BrittanyBunk closed 4 years ago

BrittanyBunk commented 4 years ago

Too many work titles are in other languages, but should really be English if one edition is in English (it's ok for a foreign language if all of them are in that, but it would help to have the English translation alongside it)

dcapillae commented 4 years ago

Hi,

The work title should be in the original language of the work according to this.

An English translation alongside the original title it would only help users who search in English, but not the rest of users.

There is a field for other titles of the work in the edition records, for example, for titles in other languages. I usually use this field to include the title in English if it is a edition translated into another language. Rather than artificially modifying the data, it would be helpful to improve the search method in any language, either using this field or in any other way.

I imagine if it hasn't already been done it's because it must be difficult to implement.

P. S.: An example: Fantasmas. It is an edition in Spanish of Phantoms by Dean Koontz. The "Other titles" field can be used to include the title in another language.

cdrini commented 4 years ago

I strongly disagree with that guideline. The title of the work should (in an ideal world) be internationalized; so an English user would see the English title, and a Spanish user would see the Spanish title. Similar to how Wikidata handles work titles: https://www.wikidata.org/wiki/Q43361 image

Setting the work title to be the original language makes almost every user unhappy; what are the chances the user speaks the original language of a book? It's so frustrating to see a book by Tolstoy and not be able to read its title.

Can I get some feedback from @seabelis @jessamynwest ; I'd like to change that guideline. In the meantime, we should have work titles be in English if an English edition exists (as @BrittanyBunk proposes). In the future, we should expand our data model to be like Wikidata's to support having multiple titles (one for each language); but that doesn't have to be decided right now. There could/should also be a field for "original title".

LeadSongDog commented 4 years ago

@hornc @dcapillae It would, though, be a real UI improvement if the work page could group the languages somehow (perhaps facets or just list sort ordering) so that a reader trying to find a German, Turkish, or Spanish edition of Phantoms does not have to page through all the English ones to find it.

LeadSongDog commented 4 years ago

@cdrini Please correct me if I'm wrong, but it seems there is still no way for an OL user to state their preferred language. Until that is fixed, preferring results in that language will remain impossible. The next-best option would be to test browser settings, but that is often not configurable e.g. in public library kiosks.

cdrini commented 4 years ago

You are correct; it's not configurable at the moment.

seabelis commented 4 years ago

Not all works have been translated into English.

seabelis commented 4 years ago

And there's been some discussion about using the work to populate the 'translated from' fields in the edition, this would require the work to represent the original language.

A literal translation of a title may not be the actual title in English, if there are multiple versions of an English title, which would be used?

LeadSongDog commented 4 years ago

Not even the titles of all works have been translated to English. For that matter, not every work even has a title.

dcapillae commented 4 years ago

Hi,

Editions in other languages usually include a reference to the title of the original work. This value should be added in the "Translation of" field. This value can be only one.

Editions in other languages are sometimes published with different titles, e. g., Ramsey Campbel's The Nameless has been published in Spanish with two different titles (La secta sin nombre and Los sin nombre). Sometimes original works are also published with different titles, e. g., Jack Finney's The Body Snatchers has been published with other title (The Invasion of the Body Snatchers). These titles, all of them, can be included in the "Other titles" field. This value can be more than one.

If there are multiple versions of an English title, we can include all of them in the "Other titles" field. However, the value "Translated of" can only be one: there is only one title and one work from which it is translated.

cdrini commented 4 years ago

@seabelis To be more succinct, here's my proposed change (only works which have an English edition would have English titles in the work).

Before After
The "work" data should reflect the original language, if that is known. The "work" title should be in English, if an edition exists in English.

Here's a pros/cons

Pros:

Cons:

cdrini commented 4 years ago

And there's been some discussion about using the work to populate the 'translated from' fields in the edition, this would require the work to represent the original language.

I'm not crazy over that proposal; I think I would rather have an original_edition field on the work, which stores the first publication. Then all editions not in that language can be assumed translations of this edition.*

A literal translation of a title may not be the actual title in English, if there are multiple versions of an English title, which would be used?

I would argue use the most well known (if one is clearly more well known), otherwise use the original English title. Any other titles can be added to other titles (as @dcapillae suggests).


* There is an edge case; ideally we'd be able to specify the specific edition one edition was translated from (e.g. Ismail Kadare's books are originally in Albanian, but most non-Albanian versions are translations of the French version, not the original Albanian). But we don't currently handle that anyways, so ¯\(ツ)\

tfmorris commented 4 years ago

This is so wrong-headed that I don't even know where to begin. Do people not see the vicious cycle that they are perpetuating?

"The majority of are users are English speakers, so let's do everything as English only."

Why don't we have more non-English speakers? Because at every opportunity, we actively make them feel unwelcome and drive them away!

Translated work titles is definitely the way to go long term, but until we can get there let's not backslide into English-only xenophobia.

cdrini commented 4 years ago

Translated work titles is definitely the way to go long term, but until we can get there let's not backslide into English-only xenophobia.

Strong disagree. Why should we make the experience bad for almost everyone until that happens? Again, this isn't anti-i18n or xenophobic, it's making our data more organized.

Can you give some pros/cons of having work titles be in their original language?

LeadSongDog commented 4 years ago

@dcapillae The particular cases of translations from English are relatively straightforward. It gets much more complex when discussing classics, where in some cases the original text does not survive. Case in point: we can be quite sure that Aesop never wrote in English, yet WorldCat shows this is by far the most common translation language of the 99 listed, leaving Latin a distant second.

https://www.worldcat.org/search?q=Aesop&fq=&dblist=638&fc=ln:_100&qt=show_more_ln%3A&cookie

The oldest surviving text is the 1478 collected works, where the translation into Latin is attributed to Rinuccio d'Arezzo with the Latin title Vita et Fabulae. We certainly can't be sure what the first Greek titles were, nor can we be sure that the stories originated with Aesop. What we can do is identify the earliest known text, from which others are derived. I still contend that this should simply be a work ID, not a textual title.

jessamynwest commented 4 years ago

Book titles should be in the language the book is in. Agree entirely with @tfmorris on this.

The Internet Archive is not particularly good at dealing with an international audience, let's not make it worse.

If we're only looking for a good user experience for our English speaking users (see also, only having support during the hours the IA is open) we're failing at our mission. If we "standardize" to English language we are prioritizing one set of users over another.

jessamynwest commented 4 years ago

The best way to do this would be to prioritize a feature addition whereby a user can set a preferred language.

cdrini commented 4 years ago

@jessamynwest I agree edition titles should be in the language of the book; we're talking about works. Can you provide some concrete cons of having work titles be in English? And some pros specifically of having them be in the original language?

cdrini commented 4 years ago

To clarify, we're not talking about i18n specifically here, except to keep it in mind as our long term goal. The main question is between whether the work title should be in English or in the original language of the work. To decide that, we need to come up with some sort of pros/cons of one vs the other.

LeadSongDog commented 4 years ago

Why would we want to list works in one language or the other? Certainly today's world is dominated by the anglosphere, but there's no real reason to assume every work has even one English edition, or that one translation into English is more correct than another. We should have a mechanism that can show all the distinct titles of a work, with preference to those in whatever language the user asks for.

jessamynwest commented 4 years ago

@jessamynwest I agree edition titles should be in the language of the book; we're talking about works. Can you provide some concrete cons of having work titles be in English? And some pros specifically of having them be in the original language?

Cons of Work titles being in English (as opposed to whatever language they are in)

Pros of Work titles being in English

Want to make a case for the change? Show me data that supports it. Show me bounce statistics from users landing on non-English works. Show me emails from users who are upset about having to deal with non-English titles of non-English books. It's a data driven site, so where is the data that supports this?

cdrini commented 4 years ago

@LeadSongDog

We should have a mechanism that can show all the distinct titles of a work, with preference to those in whatever language the user asks for.

Strong agree; in the long term I think the work should list all titles (in some fashion). This is mostly deciding what to do in the interim until we have the time to work on that.

cdrini commented 4 years ago

@jessamynwest Thank you for the feedback; I honestly want to have a discussion about this, but felt like I was getting auto-shut down with "What is wrong with you?" style comments.

dcapillae commented 4 years ago

@LeadSongDog wrote:

The oldest surviving text is the 1478 collected works, where the translation into Latin is attributed to Rinuccio d'Arezzo with the Latin title Vita et Fabulae. We certainly can't be sure what the first Greek titles were, nor can we be sure that the stories originated with Aesop. What we can do is identify the earliest known text, from which others are derived.

I'm not a specialist in the field. Librarians and archivists surely know what to do in these cases.

Consulting some cataloguing manuals, they prescribe that the transcription of the title must respect the spelling of the original work (the oldest book preserved, in this case). Some manuals also indicate how to transcribe certain letters of the original title which are not directly equivalent to modern letters in the case of old documents.

BrittanyBunk commented 4 years ago

PS - Why don't we just have a big yellow star next to the original work and float it to the top? So from what I see, the format would be:

How does this all sound? Thanks everyone for coming together and really hashing it out. It's really awesome in working it all out.

@hornc Not sure why this has a close label, as some of this is not hard to add in. (some is though)

seabelis commented 4 years ago

There's a difference between translating a title and the title of a translated work. Literal translations are highly problematic.

The literal translation of Mördare utan ansikte is 'Killer without a face'; the English translation is called Faceless Killers.

Demons, The Devils, and The Posessed are titles of various English translations of the same work.

On the flip side, the German translation of Jurassic Park is called Dino Park; the Spanish title of Timeline is Rescate en el tiempo: (1999-1357) which translates back to English as 'Rescue in Time.'

So please be clear about what you are requesting. Showing English titles (which?) of non-English works or translating the titles of non-English works. It seems you are suggesting both.

jessamynwest commented 4 years ago

Agree with @seabelis. Literal machine translation is not accurate translation and won't correspond to the work's title in enough cases that it is worth trying. This is one of those things in librarianship that we describe, accurately, as "a hard problem" and it can't be solved by putting more computers on it.

Strong disagree on "this will just be temporary" especially when the thing you are referring to is temporary xenophobic implications of the website. Temporary xenophobic actions are still xenophobic actions.

Open Library is a joy, but change comes slowly to it and I would not be in favor of making a large-scale change that will "some day" be replaced with the right way to do the thing. The thing should be done the right way from the beginning (let users select their language, preference that language in OL presentation of content).

As built Open Library was a multi-lingual site and that multilingualism has been phased out over the last decade+ that it's been around. We should not continue to do that.

BrittanyBunk commented 4 years ago

@seabelis I'm suggesting both, with a preference to the non-literal title when there is one. What the English title is probably won't be translated into the French title. I would say that when the site can be translated for everyone, it'll be manually entered in by the native speakers. Still, the machine should enter in a literal translation first, so it can be there first and corrected later by the better title. @jessamynwest yes. It doesn't have to be on the front end though. It could be on the back end of the website so that when the website can be translated in every language, the feature could be added alongside it. I just can't write everything out, as my responses are long enough and if they're 20 pages+ long, no one'll read it. Do you want to have a feature that has a drop down menu to show the translation of the title in a user's language? I thought about that and thought it wouldn't be worthwhile if it's through Google Translate of the English title (as Google translate doesn't translate more than 1 language). I'm not the developer for this website, so idk what would be used. These are just my thoughts. I'm not trying to create xenophobia, but find a solution to make the website more accessible to everyone (so the opposite). It just seems that the avoidance towards xenophobia (not just you) is creating the conditions for it (since you say that the website's heading in that direction, and if we do nothing, then it'd just stay that way or get worse).

I think since this issue may be bigger than I expected, maybe we should separate translations from editions, as it creates the assumption (at least with me) that everything is translated from one work (it kind of is, but kind of not). It's that if I can understand the website in my language, then I'd know how to do so for just about any language.

Those are my ideas, but @jessamynwest what would you suggest? I just use the website and have my thoughts on how it could be better. But I would like to hear others, so this can be resolved.

jessamynwest commented 4 years ago

I have made my suggestion which is that we not implement this feature as written and instead move towards implementing a feature whereby a user can choose their preferred language which will determine how they see the "work" and possibly what edition will be shown to them.

An English-first approach is xenophobic and not okay. Maybe open another issue for a "user can choose their language" feature possibility?

dcapillae commented 4 years ago

Ethnocentric perhaps, better than xenophobic. It's not okay anyway.

BrittanyBunk commented 4 years ago

@jessamynwest idk how the website will be translated in the end. These are my approaches though.

If you'd like to write that as a new issue, that's fine (I'm not, because I listed my solutions here and this is where can keep everything in one place to be fixed). I just think if it's the Google Translate method, it's inefficient.

Like I said, I'm not trying to incite xenophobia, nor be ethnocentric, I'm just trying to say that there should be a format with consistency with languages, so everyone can have equal access. With the format right now, then it would be in English (as the site's in English and so that's the consistency). I'm not saying "English first" because it's awesome and that's the way to go - it's just that the website's in that default language and someone in another country will have difficulties translating the site if it's in multiple languages. If the website's in Spanish, I'd say "Spanish first", etc. for consistency. Doing nothing though will make it worse, not better. I'm opting for the work to show the original (and original picture) and then the translation and translated picture (with the language of the user) underneath. It would first be a transliteration and then manually add the actual translation if it's different. Is this a good approach? Is there an issue with it that I'm not seeing?

dcapillae commented 4 years ago

@BrittanyBunk wrote:

I'm just trying to say that there should be a format with consistency with languages, so everyone can have equal access. With the format right now, then it would be in English (as the site's in English and so that's the consistency).

Not really. With the format right now, the "work" data should reflect the original language. It is a consistent criterion.

@LeadSongDog wrote:

I still contend that this should simply be a work ID, not a textual title.

I agree. In fact, all works have their Open Library ID. We don't need a title to identify them, so we don't need to choose a preferred language for the title. The work title in the original language is a consistent criterion. It's the same one librarians use.

If all the works have their ID, unique to each of them, I think that the issue of offering the title in one or another language should be resolved in another way, not by artificially modifying the titles or having to choose a preferred language for the work titles or for the Open Library site. It should be resolved "internally" by the software using those IDs and user preferences, I mean.

In any case, I agree with Brittany that efforts should be made to offer a better experience to the end user in terms of offering titles of works in their language (if they exist, because a transliteration is a bad idea in this case), especially to improve search functionality.

P. S.: I would like to clarify the following: an initial ethnocentric point of view can be modified, it is not necessarily a bad thing. Sometimes inevitable and unconscious, but not necessarily problematic. A xenophobic perspective is definitely a bad thing. That's why I suggested changing "xenophobic" for "ethnocentric".

BrittanyBunk commented 4 years ago

@dcapillae what about my 2 box idea about the original work as one box and the translation as the 2nd one (and also out separating translations from editions*)? What's the idea you're saying about internally? (I'm confused) Thank you for understanding - it's about a better format for readability for all users, not something that's supposed to be offensive. I just can't write out everything all at once, otherwise no one will read it.

I mean a stacked approach - like if someone clicks on the 2nd edition, the other translations pop up for a user to choose one. It could be another format, just trying to create ideas.

How come the transliteration's a bad idea (especially if it's marked as such)?

jessamynwest commented 4 years ago

Machine translations are often not a good idea for material with nuance like book titles which @seabelis mentioned above with examples. Unless there's a compelling reason to introduce bad data into the equation, I don't see that it solves a user problem.

BrittanyBunk commented 4 years ago

@jessamynwest those would be manually changed. It's for translating the title so it's more readable, and it would have a description saying it's just a translation, not the most accurate title. If not, what do you propose for readers to see if there's no book translated into their language?

BrittanyBunk commented 4 years ago

The only other idea I could think of is to not have the machine translation and just manual input. That would mean it would start out with nothing. Then multiple titles could be added in with their corresponding language (like what @cdrini's picture shows: https://github.com/internetarchive/openlibrary/issues/2601#issuecomment-551230169 - but it'd be belarusian title 1, belarusian title 2, etc. if there's more in one language).

dcapillae commented 4 years ago

@BrittanyBunk wrote:

@dcapillae what about my 2 box idea about the original work as one box and the translation as the 2nd one (and also out separating translations from editions*)?

With regard to titles in other languages, I think the most important thing is that you can search for works (and editions) from any title in any language (if those titles really exist because they have been published).

Two boxes is not bad idea, a main box with the original title and another with the translation (if it exists) in the user's language (perhaps the title with the largest number of editions), but always keeping the original title as main title. It is valuable information.

An option (filter) to group editions published in a particular language can also be of great help. @LeadSongDog suggested it above.

@BrittanyBunk wrote:

What's the idea you're saying about internally? (I'm confused)

Me too! :D I mean that the data should not be modified artificially (e.g. by adding a translation alongside, or changing the original title for a translation). The software should be able to show what corresponds in each case with the existing data: title of editions, title of works, titles of works in other languages, etc. I can't explain how it could be resolved by software, but it would be ideal.

@BrittanyBunk wrote:

How come the transliteration's a bad idea (especially if it's marked as such)?

Only real books with real titles. Open Library is a catalog for every book ever published. A transliterated title is not a real title. Nobody will look for it in Open Library because that book (with that title) does not exist (yet). I think it should not be in any book catalog. When an edition with that title is published, it will be time to add it to the catalog (with that title), but not before.

Only real books and real titles. I think that's the only right choice.

jessamynwest commented 4 years ago

@jessamynwest those would be manually changed.

Open Library has no staffing for that. Those would not be manually changed.

If not, what do you propose for readers to see if there's no book translated into their language?

The book in the language it was published in. As I see it, you are trying to improve the OL user experience for English-speaking OL users. My proposal remains: keep this the way it is, add a different feature for letting a user indicate their primary language, build features off of that.

Only real books and real titles. I think that's the only right choice.

Strongly concur.

mekarpeles commented 4 years ago

I think we're spiraling into a premature discussion which is prescribing a solution which we're not ready to prioritize. I don't think the solution as described is sufficient. It will need to be much more nuanced in terms of leveraging the infrastructure that we have to provide the right solution to our patrons. I feel inclined to close this issue until such a time as we have made the right decisions to enable moving forward responsibly. This conversation will remain here for us to refer to in the future.

BrittanyBunk commented 4 years ago

@mekarpeles sounds good. I just wanted to add that I was thinking about the transliteration to allow for searching in every language. It would just need a note that says it's a direct translation and not the work's name itself (so people know, as otherwise it'll be hard to search for the book if it's not in their language at all). If there's a translation that's for an actual book, it'd be manually added in - in a way that it's possible to see the two differences (like an actual book name is highlighted). I guess this was premature (sorry I brought it up too early), but I hope I helped. @jessamynwest by manually, they do: the OL user editors, like me and others. For the rest, I won't stop you from posting new issues, but this one got closed, so idk if it's a good idea anymore.