readium / webpub-manifest

📜 A JSON based Web Publication Manifest format used at the core of the Readium project
BSD 3-Clause "New" or "Revised" License
87 stars 23 forks source link

Sorting keys are language dependent #48

Open qnga opened 4 years ago

qnga commented 4 years ago

Currently RWPM supports sortAs in subjects, titles and contributors independently of their localized names. But sorting key is in fact language-dependent and should be supported as such.

I think, for example title, should be used as follows:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

Or

title: {
  "name": "Le Tour du monde en quatre-vingts jours",
  "sortAs": "Tour du monde en quatre-vingts jours, Le"
}

Or

title: {
  "name": "Around the World in Eighty Days"
}

In Kotlin app, everything is ready for that. We have an object LocalizedString that contains objects Translation which may contain a sorting key besides canonical string.

HadrienGardeur commented 4 years ago

I understand your point but that's really not the usage for this.

Sorting keys are mostly used by reading apps to handle the following actions in a bookshelf:

For this specific use case, having multiple sorting keys is more confusing than helpful.

When we look at EPUB files or OPDS feeds, it's already a miracle when they include both a sorting key and multiple translations. I don't think we can ever expect to get sort keys for each translation (I'm not even sure if that's doable with EPUB 3.x).

It's also worth pointing out that behind the scene, we're actually working with JSON-LD and not JSON.

The following example would not be proper JSON-LD since language maps in JSON-LD can only support literals and not objects with JSON-LD 1.1:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

This limitation means that currently, we can't really support @direction either (see #33)

qnga commented 4 years ago

Indeed, it seems to be few use cases. Maybe a bilingual edition? A more JSON-LD compliant alternative would be to make sortAs a language map, as title is now. The shortcut syntax

"sortAs"  = "Tour du monde en quatre-vingts jours, Le"

would still be able to be used.

qnga commented 4 years ago

We could also align with W3C Publication manifest which uses an array of LocalizableString objects to support text direction.

HadrienGardeur commented 4 years ago

I'm a bit wary of revisiting this right now:

chocolatkey commented 4 years ago

This comes at a perfect time. I'm currently implementing such a system: image Where it would be nice to fit that data in a webpub. Since I hadn't gotten to the point of generating them yet, I hadn't even considered the fact that sortAs is a string only in the schema. Is there a way I could fit all this data in? Internally, the data is given like this:

[
    {
        "name": "The Combat Baker and Automaton Waitress",
        "sortAs": "Combat Baker and Automaton Waitress, The",
        "language": "en"
    },
    {
        "name": "戦うパン屋と機械じかけの看板娘",
        "sortAs": "タタカウパンヤトオートマタンウェイトレス",
        "language": "ja"
    }
]

I don't know how an app would actually implement multiple sorting keys

In my use case, the sorting key of the user's publisher or client language is used

qnga commented 4 years ago

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

HadrienGardeur commented 4 years ago

Thanks @chocolatkey for chiming in and proving me wrong regarding use cases 😉

Could you provide some additional context for this use case? It looks to me that you're trying to do the following:

If we move sortAs away from being a literal and add the same language map approach that we use for title and name, this could be represented as:

{
  "title": {
    "en": "The Combat Baker and Automaton Waitress",
    "ja": "戦うパン屋と機械じかけの看板娘"
  },
  "sortAs": {
    "en": "Combat Baker and Automaton Waitress, The",
    "ja": "タタカウパンヤトオートマタンウェイトレス"
  }
}
chocolatkey commented 4 years ago

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

In the system I am creating, the publishers (of translated Japanese doujin content) are going to have the ability to privately share a review copy of the publications with the original authors, and potentially have a mini-library for them. The original authors are usually not well-versed in English, so the original title needs to be present so the original and localized title can be displayed side-by-side. The katakana is included there for filtering and sorting purposes for when the titles are displayed in Japanese, both in that private frontend as well as the admin backend.

If we move sortAs away from being a literal and add the same language map approach that we use for title and name [...]

I think this would be a good idea, because it's backwards-compatible with the existing schema. My cases tend to be edge cases (which is probably good for probing at the limits of standard), but most people will just need "sortAs": "Single String, The". The way @HadrienGardeur represented my data in the example snippet is perfect.

HadrienGardeur commented 4 years ago

OK then let's vote through this issue using 👍and 👎on this message. I'll also bring it up in our weekly call.

Who's in favor of turning sortAs into a language map?

qnga commented 4 years ago

I drafted a proposal: https://github.com/qnga/webpub-manifest/blob/proposal/sortAs/proposals/001-multilingual-sortAs.md

@chocolatkey Would you add something more precise about your use case?

Here is an internal PR for suggestions and comments: https://github.com/qnga/webpub-manifest/pull/1

chocolatkey commented 4 years ago

@qnga what additional information would you like about my use case besides what I said previously? I can't think of much I didn't say

qnga commented 4 years ago

I was suggesting you may explain it right in the proposal. But it might not be necessary.

chocolatkey commented 4 years ago

@qnga aha now it's clear. Would you like me to fork your fork and submit a PR or comment in your internal PR?

llemeurfr commented 4 years ago

Note that the Go implementation of a Publication already has a "MultiLanguage" struct which is currently applied to the title and subtitle properties and could easily be applied to sortAs as well. Therefore the move is not hurtful for the Go code.

qnga commented 4 years ago

The easiest way is adding suggestion snippets in comments of the PR https://github.com/qnga/webpub-manifest/pull/1

Thanks Laurent for the feedback about the Go implementation.