Closed michael-e closed 13 years ago
Thanks mate. I must admit I haven't tested with characters outside of English so I'm not surprised it doesn't work with multibyte chars. Do you have any other examples I can test with? I'll take a look into this next week - I know a few places that are troublesome.
I will try and select another example. But I am rather sure that the döner
will reveal nearly everything! :-)
Do you know what a Döner is? Well, it's a doner. I like the word in German, it sounds so funny!
Michael, search for kebap and everything will be fine ;)
In English it's a boring "donna" but I'm guessing German is more "durna"?
Ah, no, the "ö" sounds more funny here. :-) Like in the word "schön".
Click on the loudspeaker symbol.
Pretty sure I've fixed these in: https://github.com/nickdunn/search_index/commit/68d2b394539f93c7be0d011d749cef262e4bc7a6
mb_
string functions when they are available (didn't fix any of the bugs you discovered, but good practice nonetheless)ö
character. Oops! Replaced with a better punctuation-stripping function that is used elsewhereutf8_encode
function which doesn't like encoding strings that are already UTF8. So I only now run strings through this when they're not already UTF8All three of your bugs are fixed by the above commit.
To re-test you will need to:
sym_search_index_keywords
That looks very good now, Nick.
Well, one thing: The excerpt-length setting does not work anymore.
I'm pretty sure this didn't work before either :-/
Well, at least the excerpts were a lot shorter than they are now. I haven't counted characters. I should have done that. :-)
Will raise it as another ticket as it doesn't seem related to the changes I made today. It's more down to the massive chunks of code that I lifted from Drupal 5. I'll see how Drupal 7/8 did it ;-)
Thanks, Nick.
(I really think that this extension is a big step forward for Symphony as a whole
Cheers! Any other feedback or ideas you've got, please let me know.
This was caused by the mb string changes. There is a simple fix I have committed locally to rectify the excerpt length bug.
Glad to hear this. You will push it soon, right?
Can you explain why entites in the excerpt are encoded (as posted here)? Is this an issue or a feature?
Glad to hear this. You will push it soon, right?
Pushed.
Can you explain why entites in the excerpt are encoded... Is this an issue or a feature?
Both ;-) The excerpt is run through General::sanitize
to HTML-encode things like ampersands that break XML, hence everything is encoded. However this happens after the <strong>
tags are wrapped around the highlighted keywords, so these tags also get encoded. The net result is that you have to use an xsl:value-of
with disable-output-escaping
so that the entities are rendered properly. It works, but is a bug ugly.
I have just changed the implementation so that the string is HTML-encoded and then the <strong>
tags are added. This means that you now have to use an xsl:copy-of
to get the excerpt (a value-of
will work but will just get the plain text, not the strong elements too).
<xsl:copy-of select="excerpt/text() | excerpt/*" />
That itself is a bit messy, so I decided to wrap the excerpt in a <p>
tag, so the XSLT becomes:
<xsl:copy-of select="excerpt/*" />
Nice and simple, just like a Markdown-formatted textarea.
Thanks a lot for the fix! At first I was a little bit amazed at the the length difference from one excerpt to another (in the same search). But then I realized that there is some "intelligent logic" behind this, so the length of an excerpt depends on the position of a match in the string and the number of matches alike.
So: Well Dunn! :-)
I like the new implementation as well. It is clear and straightforward. And in case you don't like the excerpt to be wrapped in the <p>
Element, you can still use
<xsl:copy-of select="excerpt/*/*|excerpt/*/text()"/>
so the length of an excerpt depends on the position of a match in the string and the number of matches alike
Yep. I did very little beyond modifying Drupal's search_excerpt function which does the hard work. It means, however, that the excerpt doesn't match exactly the length set in the config — it's just a rough length. But I don't ever see a need for it to be completely precise.
The extension does not handle multibyte UTF-8 characters properly.
<strong/>
tag.döner
will not show any suggestions even if your field values containdöner
orsuperdöner
.superd
and you have fields containingsuperdöner
, the suggestionsuperdöner
will actually besuperdöner
(which is a typically wrong multibyte representation).So I guess the extension needs some utf-encoding/-decoding functions in the right places so that PHP can work with the strings correctly. Unfortunately I don't know enough about the inner workings to fix this myself, but I will do testing! (I really think that this extension is a big step forward for Symphony as a whole.)