toddsundsted / ktistec

Single user ActivityPub (https://www.w3.org/TR/activitypub/) server.
GNU Affero General Public License v3.0
366 stars 20 forks source link

charset/encoding issue #105

Closed felixkrohn closed 4 months ago

felixkrohn commented 4 months ago

Sorry, it's me again :-/

Since updating (from v2.0.0-8 I believe) to v2.0.0-11 I see broken encoding on posts of other people containing "special" characters (öéüèäà and so on), which are displayed as ü, ä, ö and so on.

What I checked and tried so far without success:

Am I the only one with that issue?

toddsundsted commented 4 months ago

a few questions. is it all posts with special characters, or just some of them? can you tell if they are from a specific server? also, what version of sqlite are you running?

toddsundsted commented 4 months ago

i was able to search for and fetch that post into a locally running instance and the characters looked okay (macos and firefox). and into epiktistes.com (linux and chrome). looking back through posts i see a decent number in German, and at least those look reasonable.

one possibility is to see if you can fetch posts (via Search) that have special characters. that would at least narrow down the problem to the inbox handling pipeline (vs. the outbox or just raw fetching)

toddsundsted commented 4 months ago

the thing i'm momentarily hung up on is why your own posts aren't affected...

felixkrohn commented 4 months ago

SQLite3 version 3.45.3 As far as I can see, all such characters since updating are deformed, at least I didn't find any correctly displayed special characters in the "Everything" stream so far

toddsundsted commented 4 months ago

all such characters since updating are deformed

except your own posts, correct?

toddsundsted commented 4 months ago

and what happens if you find a post that your instance hasn't received and you search for it (which fetches it and adds it to your database)? alternatively, can you pick a hashtag and follow that hashtag?

what i'm interested in understanding is, is it only posts that another server pushes to your instance via ActivityPub that are affected, or is it every post regardless of how it is added (direct retrieval).

felixkrohn commented 4 months ago

except your own posts, correct? yup

what i'm interested in understanding is, is it only posts that another server pushes to your instance via ActivityPub that are affected, or is it every post regardless of how it is added (direct retrieval).

Sorry, now I understood. Yes, I pulled in a few posts not yet in the received set, and they show the same behaviour.

toddsundsted commented 4 months ago

@felixkrohn any chance this could be the issue? https://github.com/crystal-lang/crystal/issues/14803

it would explain why it happens to content coming in in ActivityPub format (json) but not your own content

felixkrohn commented 4 months ago

Hey @toddsundsted that was spot-on. I just rebuilt ktistec v2.0.0-11 using docker.io/crystallang/crystal:1.12-alpine, instead of :latest and the issue seems gone for new posts.

felixkrohn commented 4 months ago

you want a PR for the Dockerfile, or do you think that will be resolved upstream soon?

felixkrohn commented 4 months ago

I can now confirm that using the newest crystal alpine image v1.13.1 fixes the bug:

-FROM crystallang/crystal:latest-alpine AS builder
+#FROM docker.io/crystallang/crystal:latest-alpine AS builder
+FROM docker.io/crystallang/crystal:1.13.1-alpine AS builder

(It has in the meantime also been tagged as latest, so it's not necessary to change the Dockerfile anymore as long as your build process makes sure to not use a different cached version.)

toddsundsted commented 4 months ago

I can now confirm that using the newest crystal alpine image v1.13.1 fixes the bug

great news! thanks for confirming this!