"Accent-sensitive" behavior expected in sqlean/unicode LIKE function

carlosalbertolajr commented 9 months ago

I’m working on an application that allows textual search of records through a query input.

My language is Brazilian Portuguese, so the words are full of accents not covered by the ASCII standard. Because of this, I was looking for an extension that would provide Unicode support and ended up finding this wonderful project.

I'm using the sqlean/unicode LIKE function in my implementation, however I expected the case-insensitive assessment to disregard only the case, not the accents:

--- Expected --- sqlite> select like('CARNÊ', 'carnê'); 1 sqlite> select like('carne', 'carnê'); 0

--- Obtained --- sqlite> select like('CARNÊ', 'carnê'); 1 sqlite> select like('carne', 'carnê'); 1

The issue is that many words in Portuguese have totally unrelated meaning depending on the accent, for example: "carne" / "carnê", "ai" / "aí", "é" / "e", "maiô" / "maio".

Is there any way to get the desired behavior using any sqlean capabilities?

nalgeon commented 9 months ago

If you are only using like to check for case-insensitive equality, you can change it to = with lower:

.load /opt/.sqlpkg/nalgeon/sqlean/unicode

select lower('CARNÊ') = lower('carnê');
-- 1

select lower('carne') = lower('carnê');
-- 0

But if you need to check for patterns — I don't think unicode can do that with respect to accents.

carlosalbertolajr commented 9 months ago

I considered this solution, but I'm afraid that for my use case, namely textual search, using the like with patterns would be more suitable.

In any case, thank you for the answer!

nalgeon commented 8 months ago

Sorry I wasn't much help.

nalgeon / sqlean

"Accent-sensitive" behavior expected in sqlean/unicode LIKE function #105