microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 31 forks source link

Payloads as varchar and charset of payloads #164

Open Francois-BellegardeOSF opened 1 year ago

Francois-BellegardeOSF commented 1 year ago

The choice of using varchar(max) instead of nvarchar(max) for payloads causes issues when payloads contain characters not supported by varchar.

The workaround would be to encode every message, but that is a solution more adapted for new apps, and even then, it's a bit awkward.

Migrating existing apps could really be problematic.

For example, I added an emoji to an activity output. Note the '??' that have replaced it. When using Azure Storage, there is no issue with charset.

image

cgillum commented 1 year ago

@Francois-BellegardeOSF what collation is your database using?

SELECT DB_NAME() As DB, DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation
cgillum commented 1 year ago

The schema we use assumes you are using a _UTF8 database collation. The default setup uses Latin1_General_100_BIN2_UTF8, which should be able to handle non-ASCII characters fine in varchar columns.

Support for UTF-8 collation was added in SQL Server 2019: https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928.

Francois-BellegardeOSF commented 1 year ago

When I create a DB, it still defaults to SQL_Latin1_General_CP1_CI_AS. And apparently, this is also what IT did in production.

I tested reimporting the DB after editing model.xml in the .bacpac and it solved this issue. It will be unpleasant and maybe impractical for large databases, but in my case it's manageable.

Thank you for clearing this up. Could this not have been made to explicitly use a UTF8 collation on the columns? If people edit their schema to use a different collation, would it cause an issue? I would think that as long as every column are changed, at least there would not be internal issues. I wonder how future proof this would be.

Francois-BellegardeOSF commented 1 year ago

@cgillum I just realized this is a case sensitive collation. This makes it a huge breaking change for existing databases. Would you say this would be safe to use with a case insensitive equivalent?

cgillum commented 1 year ago

@Francois-BellegardeOSF yes, this particular collation was chosen for performance reasons. I also like to use it at development time to make sure we maintain compatibility with case-sensitive databases. But you can still use our schema with non-case sensitive databases. The UTF-8 issue should be the only compatibility concern.

Francois-BellegardeOSF commented 1 year ago

@cgillum Thank you for this prompt and clear response. This is helping me immensely.

jianjunwang2 commented 1 year ago

@cgillum , we can't change the collation for our production running DB. Another the DTF tables are in the same DB with other business tables. To change collation will break our other logic. Do we have other solution to resolve this issue?