Open tkeffer opened 2 years ago
i just tried an alter on the description
field to make it utf8. need to do more research to see what cause this behavior to start, but i suspect an upgrade to mysql would have done it.
so far i do not see more messages like this in error.log, but we'll see...
here is the table before and after, and the alter command.
alter table `stations` change `description` `description` varchar(255) character set utf8mb4 default NULL;
mysql> show create table stations \G
*************************** 1. row ***************************
Table: stations
Create Table: CREATE TABLE `stations` (
`station_url` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`latitude` float DEFAULT NULL,
`longitude` float DEFAULT NULL,
`station_type` varchar(64) DEFAULT NULL,
`station_model` varchar(128) DEFAULT NULL,
`weewx_info` varchar(64) DEFAULT NULL,
`python_info` varchar(64) DEFAULT NULL,
`platform_info` varchar(128) DEFAULT NULL,
`last_addr` varchar(16) DEFAULT NULL,
`last_seen` int DEFAULT NULL,
UNIQUE KEY `index_url` (`station_url`),
KEY `index_addr` (`last_addr`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
mysql> show create table stations \G
*************************** 1. row ***************************
Table: stations
Create Table: CREATE TABLE `stations` (
`station_url` varchar(255) NOT NULL,
`description` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`latitude` float DEFAULT NULL,
`longitude` float DEFAULT NULL,
`station_type` varchar(64) DEFAULT NULL,
`station_model` varchar(128) DEFAULT NULL,
`weewx_info` varchar(64) DEFAULT NULL,
`python_info` varchar(64) DEFAULT NULL,
`platform_info` varchar(128) DEFAULT NULL,
`last_addr` varchar(16) DEFAULT NULL,
`last_seen` int DEFAULT NULL,
UNIQUE KEY `index_url` (`station_url`),
KEY `index_addr` (`last_addr`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
actually, that latin1
is kind of obvious. when did that appear? pretty sure it was not part of the original table creation. so perhaps a mysql upgrade added that to make it explicit, thus breaking any inserts that contain utf8?
sorry, but character encodings are one of my many weaknesses
the latin1 is there apparently because i did not specify a characterset when i originally created the tables eons ago, so it defaulted to latin1. apparently mysql 8 (running on weewx.com as of this writing nov 2022) defaults to utf8 (not sure whether that is 3-byte or 4-byte).
i'm not crazy about altering everything in the database. description should obviously accept non-latin characters, but should any other fields?
Interesting.
Internally, I'm sure MySQL stores it as a byte string. However, it appears that the database engine is also enforcing character type safety. That is, if you declare the field as Latin-1, then try to store something that uses non-Latin-1 characters, it will declare an error.
When it comes time to print the field out, the MySQL client will use the encoding to figure out what code point to use.
I believe that UTF-8 domain names are valid these days, although most DNS servers don't support them. To future proof things, I would use utf-8 encoding for station_url
as well.
The other fields should be fine.
It's an ancient article, but I recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. It set me straight now 20 years ago!
In
/var/log/nginx/error.log
, I am seeing a lot of errors of the sortWhat they all seem to have in common is the use of UTF-8 in the field
description
.Perl is a mystery to me, so I can't tell if it's rejecting UTF-8 characters.