ispyb / ispyb-database-modeling

4 stars 3 forks source link

Change the collation for proposal title and protein name to utf8mb4_unicode_ci #23

Open delageniere opened 6 years ago

delageniere commented 6 years ago

Recently we had several troubles while ingesting data from User Portal due to special characters. This would be fixed if we change the collations for these columns: Proposal.title Protein.name

KarlLevik commented 6 years ago

I agree, this makes sense. We actually already have the utf8mb4_unicode_ci ollation on our Proposal.title at Diamond. But not on Protein.name.

I think these SQL statements will do the job:

ALTER TABLE Proposal MODIFY title varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE Protein MODIFYnamevarchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;

delageniere commented 6 years ago

Done with script : 2018_03_13_updateCollations.sql

delageniere commented 5 years ago

We face the same type of problems with Macromolecule.name .acronym, .comments So we will change the collation accordingly.

stufisher commented 5 years ago

I cannot agree with acronym. This field is piped to the filesystem where we really do not want special characters.

stufisher commented 5 years ago

Sorry to clarify, macromolecule.acronym or protein.acronym? (why are these different tables? separate discussion). If macromolecule i assume this a SAXS issue?

delageniere commented 5 years ago

Yes, it is a SAXS issue, and in fact not necessary to change Macromolecule.acronym as it is filtered beforehand.