Open thejeshgn opened 6 years ago
Thanks for the report.
Here's a little background.
It doesn't matter what the title is in the input OPML file, we don't even store it. What matters is what we get back when we read the feed. For that we use the node.js package feedParser. I believe it is unicode-aware.
I verified that the question marks are in the database (though seeing it in the feed list is basically verification of that, it's where the data comes from).
I'm going to have to do some more investigating. I have a standalone app that just calls feedParser for debugging situations like this.
I suspect it is an encoding issue that came up in River5 recently, which uses the same feed reading package.
OK, I think I have what may be a fix in feedBase.
Here's the test.
http://feedbase.io/readfeed?feedurl=http://kannadascience.blogspot.com/feeds/posts/default
Is that the correct title?
Dave
Yes. It's correct.
Excellent. Over the next few hours feedBase will re-read the feeds, and with any luck the names on your page should become correct. If you can, let me know either way, if it works or doesn't.
Thanks for your help debugging this. I was getting a lot of feedback on this problem. ;-)
I can still see question marks on my user page
Okay -- I've dug in some more and found that I only solved part of the problem, when I added the conversion of foreign charsets, which was quite tricky. Now some more trickiness is to come.
I've subscribed to the feed myself.
http://rashmidiaries.blogspot.com/feeds/posts/default
Then I hooked up the /ping call in feedBase to the exact mechanism that maintains the database, that reads each feed in turn, and updates the title, html link, description.
http://feedbase.io/ping?feedurl=http://rashmidiaries.blogspot.com/feeds/posts/default
As we saw earlier, it's correctly reading the feed over the net. However when I add it to the database, this is what I see.
There are the question marks.
I guess I assumed that all text in MySQL would be UTF-8. But apparently not so.
I'm going to link to this from Scripting News, asking for advice.
As far as I know (From my wordpress experience) while creating database you can set the characterset and collate
CREATE DATABASE feedbase_db CHARACTER SET utf8 COLLATE utf8_general_ci;
and then while creating table you can use
CREATE TABLE feeds (
) ENGINE=MyISAM DEFAULT CHARSET=utf8
This should solve both storing and sorting of Varchars.
In our case we can alter the table
ALTER TABLE feeds MODIFY title VARCHAR(250) CHARACTER SET utf8 COLLATE utf8_general_ci;
"If you use ALTER TABLE to convert a column from one character set to another, MySQL attempts to map the data values, but if the character sets are incompatible, there may be data loss."
In another thread I'm asking for advice on how to backup the database. I don't want to do anything radical that could lose data w/o a good backup.
On the other hand I think mainly we would lose data that we don't actually have yet.
So my next thing will be to get a good backup.
What he, and they on the other thread, said :)
=> Sequel Pro (or any graphical client) will really show you the structure of the database (the encoding used etc…), will make it easy to import/export data, browse data etc… Very useful when you’re new to MySQL.
JY
Le 3 avr. 2018 à 19:53, Dave Winer notifications@github.com a écrit :
In another thread I'm asking for advice on how to backup the database. I don't want to do anything radical that could lose data w/o a good backup.
On the other hand I think mainly we would lose data that we don't actually have yet.
So my next thing will be to get a good backup.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
FWIW, I like Sequel Pro for a lot of things, but not their export feature. It always makes a new INSERT
statement for every row, which can take forever to import if you have a lot of data. mysqldump
does it a lot more intelligently, with INSERT
statements that insert thousands of rows.
I'm working on a Node utility that backs up the database to two JSON files, one for feeds and the other for subscriptions.
The feeds file is done, it's 10.7MB. I figure that's not a problem for a Node app even if it's 10 times that size. Much more than that and I guess it might be a problem. That's the way I'm going for now.
Backing it up to a Dropbox folder and a chronologic folder structure, so I will have redundant snapshots.
On Wed, Apr 4, 2018 at 11:33 AM Tim Habersack notifications@github.com wrote:
FWIW, I like Sequel Pro for a lot of things, but not their export feature. It always makes a new INSERT statement for every row, which can take forever to import if you have a lot of data. mysqldump does it a lot more intelligently, with INSERT statements that insert thousands of rows.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/issues/22#issuecomment-378643674, or mute the thread https://github.com/notifications/unsubscribe-auth/ABm9O5dwQIwshtoRhzgsjIuHPYxMB5Wpks5tlOfBgaJpZM4S_7gn .
I have backup and restore working, so I have the same installation on my Mac desktop as I have on the server. So I can do the experiments here.
Here's the source for the backup and restore apps.
https://github.com/scripting/feedBase/blob/master/backup/
Dave
BTW, one of the fallouts of what I'm doing right now, is that it will be possible, without any changes to the source, to run your own feedBase. All you'll have to do is change values in config.json.
I just did the ALTER command on the feeds table and tried reading in
http://rashmidiaries.blogspot.com/feeds/posts/default
But got an error when I tried to update the table.
The Mac version of MySQL works somewhat differently than the Unix version. Here I get an error message, on Unix it just fails without a message.
This is the SQL command it runs:
replace into feeds (feedUrl, title, htmlUrl, description, whenUpdated, countSubs, ctSecs, code, ctErrors, ctConsecutiveErrors, ctChecks, whenLastError) values ('http://rashmidiaries.blogspot.com/feeds/posts/default', 'ರಶ್ಮಿ ಅಭಯ ಸಿಂಹ', 'http://rashmidiaries.blogspot.com/', '', '2018-04-05 15:51:44', 2, 0.407, 200, 0, 0, 13, NULL);
And the message:
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD: Incorrect string value: '\xE0\xB2\xB0\xE0\xB2\xB6...' for column 'title' at row 1
The Alter command is --
ALTER TABLE feeds MODIFY title VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;
I may just try recreating the database, since I can load this database entirely from the JSON.
Where are you running/typing these commands?
Le jeu. 5 avr. 2018 à 21:57, Dave Winer notifications@github.com a écrit :
I just did the ALTER command on the feeds table and tried reading in
http://rashmidiaries.blogspot.com/feeds/posts/default
But got an error when I tried to update the table.
The Mac version of MySQL works somewhat differently than the Unix version. Here I get an error message, on Unix it just fails without a message.
This is the SQL command it runs:
replace into feeds (feedUrl, title, htmlUrl, description, whenUpdated, countSubs, ctSecs, code, ctErrors, ctConsecutiveErrors, ctChecks, whenLastError) values (' http://rashmidiaries.blogspot.com/feeds/posts/default', 'ರಶ್ಮಿ ಅಭಯ ಸಿಂಹ', 'http://rashmidiaries.blogspot.com/', '', '2018-04-05 15:51:44', 2, 0.407, 200, 0, 0, 13, NULL);
And the message:
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD: Incorrect string value: '\xE0\xB2\xB0\xE0\xB2\xB6...' for column 'title' at row 1
The Alter command is --
ALTER TABLE feeds MODIFY title VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;
I may just try recreating the database, since I can load this database entirely from the JSON.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/issues/22#issuecomment-379057935, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFBAWnihz3VpM5TDz8Tjl5bR0-_-Y1ks5tlnc4gaJpZM4S_7gn .
It's all happening on my Mac desktop running the official MySQL.
The ALTER command is running at the mysql command line.
The "replace into feeds..." command is being run by feedBase running on the Mac in response to a /ping command.
For 3, can you check that the charset option for the connection is correct?
https://github.com/mysqljs/mysql/blob/master/Readme.md#connection-options
JY
Le jeu. 5 avr. 2018 à 22:06, Dave Winer notifications@github.com a écrit :
1.
It's all happening on my Mac desktop running the official MySQL. 2.
The ALTER command is running at the mysql command line. 3.
The "replace into feeds..." command is being run by feedBase running on the Mac in response to a /ping command.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/issues/22#issuecomment-379060306, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFBBJo-18Ye3tssaNXAJHdzpYDUtvZks5tlnlFgaJpZM4S_7gn .
I did this --
"database": { "host": "localhost", "port": 3306, "user": "root", "password": "xxx", "charset": "utf8mb4", "connectionLimit": 10, "database": "feedbase", "raise_on_warnings": true }
Did the ping, same problem as before.
I have to take a break now, wil be back in about an hour.
I also have MySQL Workbench running on this machine, btw. I don't know how to use it, but I fumble around trying stuff out. Eventually I'll figure it out. ;-)
If you used utf8 for the alter, use utf8 for the connection charset too.
Or use utf8mb4 everywhere. (In the alter and the connection )
JY
Le jeu. 5 avr. 2018 à 22:23, Dave Winer notifications@github.com a écrit :
I did this --
"database": { "host": "localhost", "port": 3306, "user": "root", "password": "xxx", "charset": "utf8mb4", "connectionLimit": 10, "database": "feedbase", "raise_on_warnings": true }
Did the ping, same problem as before.
I have to take a break now, wil be back in about an hour.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/issues/22#issuecomment-379064682, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFBDPqJ-v-UmV3oV0ZIbTC69bI9A0Yks5tln0ngaJpZM4S_7gn .
do use utf8mb4 by the way. You don’t want a bug because of Poo. :)
in MySQL workbench, on the lower left you see « Schemas ». Click on the « sys » database and you’ll be able to list the tables. Click on the little « i » buttons next to table names. That will open a new tab where you can see the table structure. (Columns and their encodings. )
JY
Le jeu. 5 avr. 2018 à 22:24, Dave Winer notifications@github.com a écrit :
I also have MySQL Workbench running on this machine, btw. I don't know how to use it, but I fumble around trying stuff out. Eventually I'll figure it out. ;-)
[image: image] https://user-images.githubusercontent.com/1686843/38389905-c3046946-38ed-11e8-9451-284bd5a0f337.png
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/scripting/feedBase/issues/22#issuecomment-379065015, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFBNbydrYbjco5ckT6VDInxqZZYwvoks5tln1rgaJpZM4S_7gn .
Use also utf8mb4_general_ci for the collation. (Note the mb4 part)
CHARACTER SET utf8mb4 COLLATE utf8mb4 _general_ci
JY
Let's review the commands I use to create the database and the tables. I have them documented. Let's get them really dialed in. Then I'll re-create the database, on my Mac, re-import the data, do some pings. Once it works, I'll do the same on the main server, and hopefully we'll have something we can build on for years to come.
Here are the docs for the tables.
https://github.com/scripting/feedBase/blob/master/docs/database.md#sql-commands-to-create-the-tables
And here are the docs for config.json:
https://github.com/scripting/feedBase/blob/master/docs/config.md
I already know that I have to add a charset value in the database sub-object.
I'm back at work this morning.
I did a search on encodings and came up with this article.
https://mathiasbynens.be/notes/mysql-utf8mb4
It seems to have it covered for updating an existing database. I'm still looking for an article that covers starting a new database with new tables with utf8mb4 encoding.
BTW, no question utf8mb4 is the proper encoding to use.
You can use a default charset and collation for a table:
CREATE TABLE t1
(
[... columns definitions...]
) DEFAULT CHARACTER SET utf8mb' COLLATE utf8mb4_general_ci;
https://dev.mysql.com/doc/refman/5.7/en/charset-examples.html
Not sure about the right collation to use, because people turn to utf8_unicode_ci or utf8_general_ci (when using the utf8 charset) :
https://stackoverflow.com/a/367725
Someone mentions:
Update: For newer versions, recommend utf8mb4 and utf8mb4_unicode_520_ci. These give you the rest of Chinese, plus improved collation
See the example 4 in the previous link on MySQL.com, you can also set a default charset/collation for the database (instead of repeating it at the tables level)
CREATE DATABASE d1
DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
USE d1;
CREATE TABLE t1
(
c1 CHAR(10)
);
But i prefer to be explicit at the table level, as it's also what mysqldump does when it dumps the CREATE statements.
More details here:
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html
Thank you JY! ;-)
But I'm confused. What is latin2? I'm using utf8mb4, right?
Yes yes, sorry. i copy pasted their example for the DEFAULT CHARACTER SET syntax, but do use utf8mb4 when seeing another charset like latin1 etc...
BTW, another question.
When I recreate the tables, I want to use TEXT type instead of VARCHAR.
I don't like having to specify the size limit.
It seems I can't use TEXT for a key field.
They have limits too, see here (there is also mediumtext etc..) :
https://stackoverflow.com/a/6766854
Note the part about utf8mb4 using a max of 4 bytes per character.
I've run these three commands. The first worked, the second two got errors.
ALTER DATABASE feedbase CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE feeds CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE subscriptions CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
This is the error (same for both):
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
The "longest" text column seems to be "description" but even for this one, TEXT may seem overkill. Or not :-)
Difference between VARCHAR and TEXT:
Here are the commands used to create the tables.
create table subscriptions (username varchar (255), listname varchar (255), feedUrl varchar (512), whenupdated datetime, PRIMARY KEY (feedUrl, username));
create table feeds (feedUrl varchar (512), title varchar (255), htmlUrl varchar (512), description varchar (512), whenUpdated datetime, countSubs int, ctSecs float, code int, ctErrors INT DEFAULT 0, ctConsecutiveErrors INT DEFAULT 0, ctChecks INT DEFAULT 0, whenLastError DATETIME, PRIMARY KEY (feedUrl));
I don't have a good guess as to what happened. I thought it wouldn't change the size of any of the VARCHARs. But it seems that must be happening. Last I checked 512 < 767.
Can you first check if the database is using InnoDB? (in MySQL Workbench)
Where to look? Here's what the front panel says.
"Schemas", "sys" at the bottom left.
then "i" for info.
More context on the error, i had to deal with that recently:
I don't know what "I" for info means.
just next to the "sys" database, hover with the mouse, you should see an "i" button
same at the Tables level.
yes, that's the first "Info" tab, you should see other tabs in this new window. Look at the "Tables" tab.
Bing
mmm, actually the sys schema is used by MySQL
https://dev.mysql.com/doc/refman/5.7/en/sys-schema.html
so MySQL workbench is not currently browsing your own database.
Okay I read the context piece, and am utterly confused, which is understandable. There's a lot of history here to catch up on. I know how this works, I created Frontier. We had history too (though not as much).
The key in the feeds table is feedurl, which as it name implies is the URL of the feed.
How short can we make it? That's the question.
Any chance you could get on Skype? This might go a lot faster...
My subscription list is here http://feedbase.io/?username=thej
It has couple of feeds with title in Kannada. Feedbase prints ???
Example
when imported into feedbase has title ???
where as in a standard html it would print ok.
HTML
RENDERING
ವರ್ತಮಾನ - Vartamaana
This feedlist opml is an export from liferea.