projectblacklight / blacklight

Blacklight provides a discovery interface for any Solr (http://lucene.apache.org/solr) index.
http://projectblacklight.org/
Other
760 stars 256 forks source link

Vernacular scripts (utf8) must display properly #29

Closed MrDys closed 12 years ago

MrDys commented 12 years ago

CODEBASE-2: concerned about display of UTF8 due to Ruby being less UTF8 friendly.

Latest Ruby is supposed to make this easier in some way ...

We know Bob got the searching working (yay bob!) with solrmarc.

It's a deal breaker for us if we don't get the non-latin stuff displaying well. Our librarians have already tested our vufind searchworks for at least the following:

chinese hebrew arabic russian cyrillic all sorts of diacritics

right to left issues for hebrew arabic (and whatever else)

There is already some example data in the solrmarc svn project. It's currently in the 2.0 branch, under test/data

unicornWHoldings hebrew diacriticTests non-latin <-- I didn't put that there, so I don't know what's in it.

If we need more examples, I can certainly get them. I'm only confident there's already hebrew and chinese in there, and lots of diacritics. I was trying to write tests for the diacritics, but I couldn't get them to pass and fail at the right times.

MrDys commented 12 years ago

Original reporter: ndushay

MrDys commented 12 years ago

ndushay: Bess is also going to send Stanford some specific examples of vernacular scripts (chinese at least) displaying in a blacklight UI.

MrDys commented 12 years ago

ndushay: careful with "composed" vs "decomposed" chars (Bob will know - I probably have terminology wrong)

MrDys commented 12 years ago

bess: Can you give us some specific records that it appears are not displaying correctly right now? Via either search works, or Blacklight?

MrDys commented 12 years ago

ndushay: This should probably be split into a couple of issues:

A. sorting of non-latin scripts. Within some languages (chinese? see below) and interfiled with latin scripts.

B. Right-to-Left vernacular content display.

On Mar 13, 2009, at 11:54 AM, Tom Cramer wrote:

Naomi,

I'm not sure what your citing with "broken UTF-8 display", but wrt supporting vernacular scripts in Unicode in the BL display, there are three separate classes of issues:

  1. do the scripts display properly in search results and detailed record pages, especially for entries in right-to-left scripts? It is tricky to display fields that have entries in both roman script and Hebrew/Arabic in an intelligible fashion. Socrates does this now, though it took some work.

This may be an indexing issue. The index needs to have the appropriate fields as "display" fields. In our vufind implementation, I relied on java's BIDI and the ordering of MARC field concatenation. I got "mostly" there, as I recall. Bob Haschart may be able to improve this.

  1. Indexing; is the vernacular script data indexed, and accessible for vernacular script search arguments? (I think this is already in place for BL, no?)

UVa needs to included fielded searching (as a pulldown "everything" "title" "author" "subject" "ISBN" ... in the plugin). Naomi needs to index vernacular fields properly and set up the fielded searches properly. Bob already did the work for solrmarc and it's wonderful.

  1. Sorting: do the results of UTF-8 searches sort in the appropriate alphanumeric order (whatever that may mean on a language by language basis--in Chinese, e.g., I believe sort order is determined by number of strokes in the first pictographic character, not by the alphabet).

I don't know who should tackle this. Bob? We have a title_sort field ... that doesn't sort correctly. We have a way to map Latin diacritics to plain chars (borrowed from your code), but I'm not sure that's working ... and not sure what to do about non-latin chars.

Having just converted Symphony to Unicode and partnered with Sirsi to get iLink to display and sort results appropriately, we have a solid set of test cases and internal expertise to help get BL up to snuff on these. Lauren would be the key resource in coordinating this testing and feedback from the appropriate language experts.

There is at least one Arabic record with vernacular script in the blacklight demo data. Basically, use the language facet to find Hebrew and Arabic. If you don't have someone to look at Hebrew or Arabic results, we might be able to help.

MrDys commented 12 years ago

ndushay: This has been split into two separate issues, so this one is now closed.

MrDys commented 12 years ago

bess: I won't really be able to test this until we can do some more complex indexing, which will be enabled by using solrmarc.

MrDys commented 12 years ago

bess: The problems expressed here are covered separately in other tickets.

MrDys commented 12 years ago

raz71abb6: [buy cheap tramadol on|http://thoughtmesh.net/meshes.php?group=33] [buy tramadol online without perscription|http://thoughtmesh.net/meshes.php?group=34] [buy tramadol online|http://thoughtmesh.net/meshes.php?group=35] [cheap tramadol fedex overnight|http://thoughtmesh.net/meshes.php?group=36] [cheapest tramadol|http://thoughtmesh.net/meshes.php?group=37] [tramadol by cod to california|http://thoughtmesh.net/meshes.php?group=38] [discount tramadol|http://thoughtmesh.net/meshes.php?group=39] [low price tramadol|http://thoughtmesh.net/meshes.php?group=40] [generic tramadol|http://thoughtmesh.net/meshes.php?group=41] [online prescriptions tramadoltramadol online|http://thoughtmesh.net/meshes.php?group=42] [order tramadol online|http://thoughtmesh.net/meshes.php?group=43] [side effects of tramadol painkiller|http://thoughtmesh.net/meshes.php?group=44] [best price on tramadol 50 and 100 mg|http://thoughtmesh.net/meshes.php?group=45] [tramadol 50mg|http://thoughtmesh.net/meshes.php?group=46] [tramadol for dogs|http://thoughtmesh.net/meshes.php?group=47] [dextromethorphan and quinidine use for tramadol addiction|http://thoughtmesh.net/meshes.php?group=48] [tramadol side effects|http://thoughtmesh.net/meshes.php?group=49] [tramadol dosage for canines|http://thoughtmesh.net/meshes.php?group=50] [tramadol veterinary dose|http://thoughtmesh.net/meshes.php?group=51] [tramadol prescription drug|http://thoughtmesh.net/meshes.php?group=52] [side effects of tramadol hydrochloride|http://thoughtmesh.net/meshes.php?group=53] [tramadol hci|http://thoughtmesh.net/meshes.php?group=54] [is tramadol hcl a narcotic|http://thoughtmesh.net/meshes.php?group=55] [side effects of tramadol hydrochloride|http://thoughtmesh.net/meshes.php?group=56] [tramadol sale us no prescription required|http://thoughtmesh.net/meshes.php?group=57] [tramadol hydrochloride picture|http://thoughtmesh.net/meshes.php?group=58] [medicine tramadol |http://thoughtmesh.net/meshes.php?group=59] [picture of tramadol hcl 50 mg tab mylan|http://thoughtmesh.net/meshes.php?group=60] [tramadol no prescription fedex|http://thoughtmesh.net/meshes.php?group=61] [tramadol on line|http://thoughtmesh.net/meshes.php?group=62] [order tramadol online|http://thoughtmesh.net/meshes.php?group=63] [order tramadol|http://thoughtmesh.net/meshes.php?group=64] [tramadol overdose|http://thoughtmesh.net/meshes.php?group=65] [cheap tramadol fedex overnight|http://thoughtmesh.net/meshes.php?group=66] [tramadol for nerve pain|http://thoughtmesh.net/meshes.php?group=67] [online pharmacy tramadol|http://thoughtmesh.net/meshes.php?group=68] [what does the pill tramadol look like|http://thoughtmesh.net/meshes.php?group=69] [tramadol pills|http://thoughtmesh.net/meshes.php?group=70] [tramadol no prescription|http://thoughtmesh.net/meshes.php?group=71] [tramadol drug|http://thoughtmesh.net/meshes.php?group=72] [low price tramadol|http://thoughtmesh.net/meshes.php?group=73] [tramadol rx|http://thoughtmesh.net/meshes.php?group=74] [is tramadol a narcotic|http://thoughtmesh.net/meshes.php?group=75] [buy tramadol online overnight delivery|http://thoughtmesh.net/meshes.php?group=76] [buy tultram online cheap|http://thoughtmesh.net/meshes.php?group=77] [how long do tramadol withdrawals last|http://thoughtmesh.net/meshes.php?group=78] [tramadol online|http://thoughtmesh.net/meshes.php?group=79] [what is tramadol|http://thoughtmesh.net/meshes.php?group=80] [tramadol hydrochloride|http://thoughtmesh.net/meshes.php?group=81]