xenocrat / chyrp-lite

An ultra-lightweight blogging engine, written in PHP.
https://chyrplite.net/
BSD 3-Clause "New" or "Revised" License
417 stars 44 forks source link

Failed to rename tag which contains Asian characters #257

Closed cuixiping closed 6 months ago

cuixiping commented 6 months ago

ENV:

Steps

  1. Create a new post with Asian characters tag, for example 文字.
  2. Go to tags manager, rename the tag.
  3. There is no change when back to tags list.

My fix

I solved it by doubling the backslashes in "tags.php". I am not very sure it's correct or not.

            $results = SQL::current()->select(
                tables:"post_attributes",
                fields:"post_id",
                conds:array(
                    "name" => "tags",
---                 "value LIKE" => $this->tags_name_match($_POST['original'])
+++                 "value LIKE" => str_replace("\\","\\\\",$this->tags_name_match($_POST['original']))
                )
            )->fetchAll();
xenocrat commented 6 months ago

Thanks for the report. I see the problem: the tag names are stored as JSON, which will encode any non-ASCII character as a Unicode codepoint using the hex notation "\uXXXXXX". We need to tell the RDBMS we are searching for a literal backslash there.

You're solution is fine for a quick fix, but I've added a commit to do this in a more correct way. It's still not working right on PostgreSQL - I need to investigate that further.

xenocrat commented 6 months ago

Better fix committed just now. :-) The problem is specifically related to LIKE... ESCAPE. Thanks for reporting this!

xenocrat commented 6 months ago

I've decided to specify | as the default escape char for LIKE statements. QueryBuilder.php and tags.php have both been updated accordingly.

cuixiping commented 6 months ago

Great! Thanks for your quickly fix.

And I think it's not very necessary to escape unicode in json.

json_encode($a, JSON_UNESCAPED_UNICODE)
xenocrat commented 6 months ago

Yes indeed, it isn't needed now that Chyrp Lite enforces utf8mb4 character set in MySQL. But it's better not to change it, or it will cause a problem with discovering existing tags. With this escaping issue resolved, all is well. :-)

cuixiping commented 6 months ago

We can upgrade all existing tags to unescaped. With unescaped tags, program SQL is simpler, and database content is more readable for human who use database tools.