Closed kdepoel closed 1 year ago
Regarding The Three-Body Problem searching specifically on title doesn't work because the title that is indexed is the Work title, which for this particular Work is 三体. Searching using the "All" filter does bring up this Work when searching using the query "the three body problem". Your query, specifying the title field, doesn't work because the title field in the SOLR index doesn't match. The reason it shows up when searching using "All" is because the edition titles are indexed in SOLR under the "alternative_title" field.
Regarding Krew elfów that does seem odd. At the very least there's a data issue in the fact there's at least two separate Works for the same work. Given than OL2577486W has "Krew elfów" as its title then it looks like it should be showing up in title results for that query.
Thanks for explaining, Alex. I can find "the three-body problem" when searching for all, and through the q-endpoint. I'd thought I thoroughly went over each step to be sure before posting, so either something changed, or I messed up. I'm putting my money on the latter.
It would be nice to have the title-endpoint to search for alternative titles as well, though. I mean, the translated title is still a valid title.
relates to #6032 and #2601
I know you don't want to get into heated discussions, but the discussions you don't want to continue are ones you are continuing and are directly relevant to your issue. I'm not here to make it heated, but instead help out. I feel there's misunderstandings of the original discussion, so it's important to work them out, so that issues can be solved - as foreign languages as an issue keeps being brought up without resolve and I want to work on one.
The issue you have is with the solr. I found your book here: https://openlibrary.org/books/OL626193M/Krew_elfów . I see there is definitely an issue here with the standardization to English. It seems that when the book's searched, only the English edition's being searched for. Additionally, the translation is being searched without the accent. I'll show why: look at what happens when you type it into the title without the accent - another book comes up: https://openlibrary.org/search?q=title%3A+"Krew+elfow"&mode=everything . Do you notice how there's a question mark in the first word? That's a problem with foreign words - they get transcribed to English with English characters in a way that doesn't make sense.
The accenting discrepancy (some places use the accent and others don't) is a major issue with foreign languages and the exact reason why I brought up about using English as a standard - not because of xenophobia or encouraging it, but because the infrastructure can't handle other languages in their non-English, accented format. This is the reason for your problem. That is why your problem isn't the issue of having a non-English title for a work vs all of them in English (which is what you're mistaken on) and why my discussion was misconstrued into something different than what it actually is.
Because English the language of the Open Library's website and it's built in the US, which speaks English - all parts of the website are going to be using English to work from - including the developer side. So to make it something that it's not doesn't make sense either.
That's why I posted #6032. I'm not against book titles being in a foreign language, but because the website is in English and google translate does only one language at a time, and the searches work best in English (even if the searches are able to handle foreign languages, people and bots will remove accents, so results won't come up in a perfect system), that's why we need a translation/transliteration of every foreign title in English.
What's missing is a feature to let people find their title in a foreign language from the English title. So people should type the english translation/transliteration in the search bar and type in the foreign language they're looking for, so that the search engine can pop up the language that's being sought.
In the end, you do have issues with how the site is setup, because you can't access the books you need. So here's my issues/solutions to look at to solve this:
Unfortunately the Open Library doesn't have a good way of addressing foreign language titles. This shouldn't be turned into a heated discussion, blame game, or attack on the situation, but an opportunity to fix it.
that's about it for this issue.
@kdepoel 2) good to know
1) yes - the first reason is exactly why I brought up #2601 to begin with - the diacritics are problematic on the website for searching. It's nothing to do with cultural appropriation or hate - it's because it's a logistical problem/issue.
However, we can't call it a bug, because the issue doesn't stem from the search engine itself. The reason is because the search function on the Open Library can handle diacritics. The issue is how information is inputted and where it's inputted from. It's because other websites can't handle diacritics themselves or each one handles them differently and because people also handle them differently (some of it's due to their culture or they may not have capabilities on their own computer device) - that every time the diacritics are inputted, they would be in a different way.
Some may place a diacritic, some may take it off, some may use a different symbol in place of it (like a question mark or a number where only letters should exist). These are transcription issues. It may be also be translated or transliterated.
Because of such variation, by the time someone, like you, goes to search for the title, it's in so many different wordings that it's very unlikely to get to the actual book. As you said, you can end up with 0 search results at the end.
So that's why there should be a standardization to one language. I picked English, because the website is in English and it has no diacritics and that's what the other websites that we pull info from default to. It actually does the opposite of what it's accused of, because it helps readers in all languages get to the books that're needed. Also, it doesn't involve getting rid of the original title - it just allows people to see what the title looks like in their language, so they can actually read and find it. If we don't, no one can find their books in the language they're looking for, unless they know how it's written on the website - which people can't do, as they can't 'read the mind' of the website. Obviously it's not enough, which is why I added #6032, which serves a completely different purpose of placing in titles of different languages, so when they're searched, those titles are found. It's related, but I digress.
Anyway, it's not a bug, but a missing feature. We just don't have a feature right now that accounts for the discrepancies of inputs. I don't call it a bug, because there's no 'solution' to fixing the bug as long as edits and inputs of information come from different sources. If that's the case, it would be a different website, as that goes against its main purpose. It's a feature we need to add - that feature is #6032 , because you can add not only all the variations that'll appear for a foreign language in English, but also how it'll look in other languages too. This would actually solve the 'standardization to English' issue, as just translating to English won't be enough for everyone. However, I posted the idea as a start and it does have its purpose for the search engine - because then the diacritics wouldn't be an issue. This is how I'd expect it to look:
Obviously it's not going to say to type in English, but that's the best way to get a result, as I said, due to all the other websites standardizing to English themselves, even if the OL doesn't. However, bear with me. The idea is that you type in one language and tell the search engine which language version of the book you want to view.
Now, we don't have to do just English to make it work. I'm just saying this is what we can do if #6032 is not implemented - because at this moment, the entire website and search engine will be optimized for that. However, if #6032 is implemented, we can do something more like this:
As you can see, both #6032 and #2601 would need to be combined with the language choosing feature on the search bar to allow people to type in a title in any language they want to type it in and then select the language they want to see for the book/title, so that they can come in from any language and variation to go to any language that the book's available in. These combinations of features are going to solve the issue you've presented.
To me, at the very least, it makes sense for the Anglicized title versions should be looked into first, as that's what everyone uses for inputting in - to be able to search with. However, it shouldn't stop there - all variations of every language should be inputted - and this is done by hand and through automation - so that anyone in the world from any culture and any mindset can type in any variation of a title - in case anything other than English exists or is typed in.
We should also make sure that when the language is selected, their TTT (transcription, translation, transliteration) will be searched through. That's the best solution I have for this problem.
I've read your texts, but you haven't convinced me. I'll refer to my previous statement Before you pour another massive amount of (the same) arguments in here. I'm not the one you need to convince. You mentioned the related issues in which you argue for your case, let's keep it to those.
@kdepoel yes - we are going to keep what each other wrote. I'm not here to convince you of anything. I had to correct misinformation, and then I offered my solutions. I brought in relevant context for those too and am glad the debates are moving towards productivity. I finished my part of those I feel too.
It's up to you to go to the appropriate people who are in charge of fixing this to do so. The person who can fix this is Drini, but it's up to Mek to decide to move this forward or not. https://docs.google.com/document/d/1edU3lCTHAjFr1mXUilh8l1_rNek33pRTzFHltBU--fM/edit you can see how to contact Drini via email. It looks like https://docs.google.com/document/d/1jC76TVZ28mMUQy4_ufdawU40aXPWtcwyTQ79XgrOV4o/edit?pli=1# is a priority for Q1 2022, so it looks like it's going to be addressed really soon! It even says 'Patrons want to find books in their language'.
From the document, it says they're going through reindexing, so it's at stage V4, as this issues comes after reindexing.
Hi @kdepoel , "What the default work title is, is then irrelevant." That's exactly where we're headed :)
This is a large project but it's a priority and plan is to get the first versions out for testing this week. We're currently doing some performance testing on production (indexing all editions ~doubles the total number of records in our search engine), but there won't be any visible search result changes just yet.
The epic tracking the technical implementation is #6377 . There are a lot of issues that will be fixed by that epic (listed in the "Goals" section), but choosing to keep them separate to keep my desk a little tidy :)
Ok! The original examples all work now:
the three body problem
https://openlibrary.org/search?q=the+three-body+problem&mode=everything
title:Krew elfów
https://openlibrary.org/search?q=title%3AKrew+elf%C3%B3w&mode=everythingKrew elfów
https://openlibrary.org/search?q=Krew+elf%C3%B3w&mode=everythingIn the API, you need to set two parameters: &fields=key,title,editions
(i.e. you must specify editions in the fields parameter). Eg https://openlibrary.org/search.json?q=Krew+elf%C3%B3w&mode=everything&fields=key,title,editions . Note the new editions
field in the results.
Searching a translated title is not always possible or straightforward.
(I've read some heated debates about in what language the Work should be and whether or not English should be the default language. I don't want to bring that up here. I'm fine with the current situation. I just want to identify the work/edition, when searching for a translated title).
Evidence / Screenshot (if possible)
Search for the book "The three-body problem" by Cixin Liu. The original title is in Chinese, but I'm looking for the English version. Search (website and api) does not return this specific work / book / edition. Only when searching for the author, you can drill down to the relevant translation (e.g. OL25840917M)
Search for the book "Krew elfów" by Andrzej Sapkowski, which is the original title. When searching on the title specifically, the result is: 0 records found. When searching on "all", two results are shown, none bear the title in Polish, just in English. (the json also does not show the phrase "Krew elfów" anywhere. When searching for the author, the results do show a book with this title. Drilling down, it actually shows a Work: OL2577486W. This does seem like a bug.
Relevant url?
website: https://openlibrary.org/search?q=title%3A+%22the+three-body+problem%22&mode=everything API: https://openlibrary.org/search.json?title=Krew%20elf%C3%B3w
Steps to Reproduce
Proposal & Constraints
Not sure what the proposal would be. Maybe the translated title should have come up, and this is just a data issue for a specific book. It might also be intended functionality, that you can't search for translated titles. I don't believe that though.
Please shed some light on how this is supposed to work, and whether this is to be classified as a bug, a data issue or a feature request.