Closed kaihendry closed 5 years ago
I think it's actually more to do with the procedure setup than the tables now.
SHOW CREATE PROCEDURE
I'm assuming that this is for the unee_t_enterprise
database
This should be fixed now. @kaihendry can you check and confirm before we can close this issue.
mefe_failedUnitCreations
does not belong to the BZ database, it was a f... up I made in the DEV staging (thanks for pointing that out).
I've fixed the issue by deleting the table mefe_failedUnitCreations
from the BZ database.
This is bugzilla, not enterprise.
Database Collation seems to be mixed between utf8_general_ci & utf8mb4_unicode_520_ci
@kaihendry AFAICT this is a RDS configuration issue ---> not related to the database schema ---> there is nothing I can do here
@kaihendry see these links for more context:
make sure to use the correct values when you correct the RDS configuration though: we use
utf8mb4
utf8mb4_unicode_520_ci
Dev and prod are configured the same way. You can confirm this yourself with the dbcheck describe API in Postman. I don't want to try force things with RDS parameters and much prefer to leave them by defaults.
Dev appears correct. Production not. Again see the Unicode API in dbcheck in Postman.
Furthermore I am bit concerned about going with utf8mb4_unicode_520_ci since best practice appears to be utf8mb4_general_ci or is it utf8mb4_unicode_ci. I am utterly confused.
I would be very careful to change this though, since I have been struggling for hours to work out howto switch my collation_connection from the default utf8mb4_generalci to _utf8mb4_unicode_520ci in cli. And if it's not aligned, mysql just gives up.
Dev and prod are configured the same way. You can confirm this yourself with the dbcheck describe API in Postman
This is obviously not true since PROD is working fine and DEV/Staging is not. There must some obscure configuration parameter that you're missing somewhere...
Dev appears correct. Production not.
This depends on your defintion of "correct". For me:
Is it possible the that Postman check is misleading us?
best practice appears to be utf8mb4_general_ci or is it utf8mb4_unicode_ci.
Just to make sure all information is collated in this issue: We use
utf8mb4
utf8mb4_unicode_520_ci
because dylan recommended that we use those to make sure we have emoji compatibility
See https://github.com/unee-t/bz-database/issues/110#issuecomment-461679326Unicode_520
stands for v5.20 of the norm (the most recent one). https://stackoverflow.com/questions/18904883/what-is-the-difference-between-utf8-unicode-ci-and-utf8-unicode-520-ci
utf8mb4
should be the collation used everywhere: see https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434
Also @kaihendry see https://github.com/unee-t/bz-database/blob/101677db8b7cc86e7c17a7515e478fc3f6cce6a0/db%20upgrade/upgrade_unee-t_v3.30_to_v4.31.sql for detailed explanation of the upgrade we made to the BZ db (all environments) back in Feb 2019
I think it's the top level database configuration issue now.
[hendry@t480s db]$ echo 'SELECT @@character_set_database,@@collation_database;' | ./connect.sh
Connecting to DEV auroradb.dev.unee-t.com
mysql: [Warning] Using a password on the command line interface can be insecure.
utf8mb4 utf8mb4_unicode_ci
[hendry@t480s db]$ echo 'SELECT @@character_set_database,@@collation_database;' | ./connect.sh -p
PRODUCTION
Connecting to PROD auroradb.unee-t.com
mysql: [Warning] Using a password on the command line interface can be insecure.
utf8 utf8_general_ci
I think both values IIUC #110 correctly, should be utf8mb4_unicode_520_ci
When this is set, this becomes the default for everything unless the collation is specified otherwise in the table/schemas etc, which is should not be.
@kaihendry can you alter you golang script to make sure that all the variables sent to the BZ database are using the correct collation and character set
In mySQL the syntax to make sure that the string 'foobar' is using the correct character set and collation is
SELECT (_utf8mb4'foobar' COLLATE utf8mb4_unicode_520_ci);
---> this should make sure that when you sent the string 'foobar' down to the BZ database it will be understood as a string using Character Set utf8_mb4
and collation utf8mb4_unicode_520_ci
See https://dev.mysql.com/doc/refman/8.0/en/charset-literal.html
This might be the root cause why create unit is slow with lambdas: See https://lowleveldesign.org/2013/07/19/diagnosing-collation-issue-mysql-stored-procedure/ for more context
When index key values have different collations than parameters in the where condition, MySql is not able to use index and needs to perform a full scan.
@nbiton and @hendry I just had a brainwave...
The performance issue we are witnessing in the link between MEFE and the BZ database looks more an more like related to collation issues.
One reason why collation matters is when you do search on strings / varchar, the collation must be the same everywhere in order for MySQL to be able to use indexes.
Today the MEFE is passing the following information to the BZ database:
mefe_unit_id
. This is a VARCHAR ---> collation mattersmefe_invitation_id
. This is a VARCHAR ---> collation mattersIF instead of a VARCHAR/String the MEFE was sending an INT instead THEN collation would NOT matter at all
@nbiton, @kaihendry how complex will it be to do this change:
1- send a unique INT instead of a VARCHAR for the mefe_unit_id
2- send a unique INT instead of a VARCHAR for the mefe_invitation_id
?
We probably need to double check a few things but that might solve all our collation/performance issues...
I explored the above idea further. Here are my findings so far (for Unit creation)
ut_data_to_create_units
. See the code here.ut_data_to_create_units
, calls the procedure unit_create_with_dummy_users
to start the actual unit creation process in the SQL database.Once the procedure unit_create_with_dummy_users
is called with the correct variables
@mefe_unit_id
@environment
then the unit creation process starts
As of today mefe_unit_id
is a String/VARCHAR.
Depending on many parameters that are apparently beyond our control this String/VARCHAR is passed with a charset/collation that might be different from the default charset/collation that MySQL is expecting (MySQL is expecting utf8_mb4
and utf8_mb4_unicode_520_ci
).
When this happens (the charset/collation are different) then MySQL is not able to use its indexes to optimize the search and needs to perform a full scan of the table. ---> this will get worse and worse as the number of units we have created in Unee-T grows
The first step in the unit creation process is to find the actual details of the unit that we need to create.
These details are stored in the table ut_data_to_create_units
as explained above.
The main source of our headaches IMO is the following line in the procedure unit_create_with_dummy_users
:
# What is the record that we need to use to create the objects in BZ?
SET @unit_reference_for_import = (SELECT `id_unit_to_create` FROM `ut_data_to_create_units` WHERE `mefe_unit_id` = @mefe_unit_id);
After the unit is created in the BZ db, we make another call to the BZ database to get the newly created BZ product ID for the unit. MEFE needs that id to be able to find bugs and other stuff.
Here again we use the MEFE Unit Id, a string/VARCHAR. The query is here
SELECT product_id FROM ut_data_to_create_units WHERE mefe_unit_id=?", MefeUnitID
We have 2 options at this stage:
In that scenario we alter the Payload from the MEFE so that MEFE sends BOTH:
The queries will become
# What is the record that we need to use to create the objects in BZ?
SET @unit_reference_for_import = (SELECT `id_unit_to_create` FROM `ut_data_to_create_units` WHERE `mefe_unit_id_int_value` = @mefe_unit_id_int_value);
and
SELECT product_id FROM ut_data_to_create_units WHERE mefe_unit_id_int_value=?", MefeUnitIdIntValue
and we can forget all our worries about charset/collation issues 🎉
@kaihendry and @nbiton if anyone has an idea I'm listening...
@kaihendry and @nbiton any reason why we can NOT implement Option 1 described above?
Must run DB schema v4.34.0 or above. See https://github.com/unee-t/bz-database/pull/129
We need to run a version of the invite scripts which
MEFE needs to ba able to send a value mefeInvitationIdIntValue
(an INT - must be unique) as part of the payload
Must run DB schema v4.34.0 or above. See https://github.com/unee-t/bz-database/pull/129
We need to run a version of the unit creation scripts which
MEFE needs to be able to send a value mefeUnitIdIntValue
(an INT - must be unique) as part of the payload
We need to make sure that all the tests are passed before deploying in the PROD.
Latest version of DB schema v4.34 has been implemented in the DEV/Staging and in the DEMO environment.
Implemented in all envos, no errors, speed for:
I needed to add &collation=utf8mb4_unicode_520_ci
to Go code's collation SQL connection string to avoid:
Error 1267: Illegal mix of collations (utf8mb4_unicode_520_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='"}
I am a bit surprised I need to do this. In my tests I didn't ExecSQL so perhaps this is where collation needs to be explicit. It's a bit annoying that I have not found a way to test the collation connections from the database are correct.
https://stackoverflow.com/questions/55998671/connecting-with-wrong-collation-to-a-mysql-server
I didn't realize were using the newly introduced int as a counter. :scream_cat: As mentioned in slack, a counter for co-coordinating on a micro service architecture such as ours is far too error prone. And if you didn't use the int as a counter, it's still far too easy to clash compared to a string. Sorry, I should have spotted what was happening here.
A string as id should be straightforward to index and lookup. We should investigate why that wasn't fast and fix that instead of using this integer in my opinion. There are other issues that muddied the waters on this bug too, like collation settings when exec-ing SQL being problematic. They appear fixed.
Perhaps we should create a new issue for making unique string IDs fast to look up, assuming that is the problem RE https://github.com/unee-t/bz-database/issues/128#issuecomment-488227233
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_unicode_520_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='
On dev there is a utf8_general_ci on mefe_failedUnitCreations which I think is wrong.
related to https://github.com/unee-t/bz-database/issues/110