Open Francois-BellegardeOSF opened 1 year ago
@Francois-BellegardeOSF what collation is your database using?
SELECT DB_NAME() As DB, DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation
The schema we use assumes you are using a _UTF8
database collation. The default setup uses Latin1_General_100_BIN2_UTF8
, which should be able to handle non-ASCII characters fine in varchar
columns.
Support for UTF-8 collation was added in SQL Server 2019: https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928.
When I create a DB, it still defaults to SQL_Latin1_General_CP1_CI_AS. And apparently, this is also what IT did in production.
I tested reimporting the DB after editing model.xml in the .bacpac and it solved this issue. It will be unpleasant and maybe impractical for large databases, but in my case it's manageable.
Thank you for clearing this up. Could this not have been made to explicitly use a UTF8 collation on the columns? If people edit their schema to use a different collation, would it cause an issue? I would think that as long as every column are changed, at least there would not be internal issues. I wonder how future proof this would be.
@cgillum I just realized this is a case sensitive collation. This makes it a huge breaking change for existing databases. Would you say this would be safe to use with a case insensitive equivalent?
@Francois-BellegardeOSF yes, this particular collation was chosen for performance reasons. I also like to use it at development time to make sure we maintain compatibility with case-sensitive databases. But you can still use our schema with non-case sensitive databases. The UTF-8 issue should be the only compatibility concern.
@cgillum Thank you for this prompt and clear response. This is helping me immensely.
@cgillum , we can't change the collation for our production running DB. Another the DTF tables are in the same DB with other business tables. To change collation will break our other logic. Do we have other solution to resolve this issue?
The choice of using varchar(max) instead of nvarchar(max) for payloads causes issues when payloads contain characters not supported by varchar.
The workaround would be to encode every message, but that is a solution more adapted for new apps, and even then, it's a bit awkward.
Migrating existing apps could really be problematic.
For example, I added an emoji to an activity output. Note the '??' that have replaced it. When using Azure Storage, there is no issue with charset.