zonkyio / embedded-postgres

Java embedded PostgreSQL component for testing
Apache License 2.0
341 stars 41 forks source link

ORDER BY not working with capital case #132

Open rayaanrizwan1234 opened 3 months ago

rayaanrizwan1234 commented 3 months ago

When doing an ORDER BY ASC it seems to get back the ones with capital case first. For example, if we had a column and a record with 'Z' and another with 'a', doing an ORDER BY ASC on that column will bring the record with 'Z' first then the other. Any way to fix this?

org.opentest4j.AssertionFailedError: expected: <[SimpleLookup(code=null - null, description=Desc, active=true), SimpleLookup(code=null - RG, description=Desc, active=true)]> but was: <[SimpleLookup(code=null - RG, description=Desc, active=true), SimpleLookup(code=null - null, description=Desc, active=true)]> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)

lesiak commented 3 months ago

This is not a bug - it is a consequence how letters are represented.

For example in Unicode: List of Unicode characters - Latin Script

Thus Z goes before a.

If you want to compare ignoring the case, convert both sides to same case using UPPER(columnA) or LOWER(columnA).

If you want more details: You may be using a different character set for strings: 24.3.1. Supported Character Sets However, the code points for Latin Script are shared across many character sets, and were introduced by ASCII encoding

tomix26 commented 3 months ago

I'm not an expert in this area, but I think it should also be possible to use ICU collation settings:

rayaanrizwan1234 commented 3 months ago

Thanks for getting back on this. What character set does the embedded Postgres use? and is there a way to specify which one it uses?

tomix26 commented 1 week ago

Sorry for the late reply, it should be UTF8: https://github.com/zonkyio/embedded-postgres/blob/master/src/main/java/io/zonky/test/db/postgres/embedded/PreparedDbProvider.java#L243