savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
http://liquidsoap.info
GNU General Public License v2.0
1.39k stars 127 forks source link

Random characters in place of apostrophe and other symbols in artist/title #3500

Closed KevanGP closed 5 months ago

KevanGP commented 10 months ago

Describe the bug In the artist and title of tracks, random characters often show up if the metadata contains apostrophe symbols, or acute symbols. For example, " & # 8 2 1 7 ; " is shown up instead of "'" (apostrophe) in media player devices (stream clients).

To Reproduce Seems to happen on random tracks, and editing the artist/title to remove the apostrophe (and replace it with one typed on the keyboard) as well as replacing the acute character with the regular letter (A instead of Á) fixes it.

Install method Opam, latest version.

toots commented 10 months ago

Hi @KevanGP and thanks for reporting.

Would you be able to share a faulty file to toots@rastageeks.org ? Thanks

Allavaz commented 10 months ago

Are you streaming to icecast? Maybe you need to set the mount charset to UTF-8 in your icecast config. I had the same problem a while ago

Basically adding this:

<mount type="default">
    <charset>UTF-8</charset>
</mount>

To the icecast.xml config file did the trick.

danbo commented 5 months ago

I came across this issue and the solution above did fix it, however, I would like to suggest that since liquidsoap is managing the mount points dynamically, it should also send the icecast charset field as utf-8 for a given mount point if the liquidsoap icecast encoding field is set to utf-8. Without this, we have utf-8 encoded strings without utf-8 enabled mount points unless the icecast server config is modified for mounts at a global level as per above.

icecast mount configuration

toots commented 5 months ago

I came across this issue and the solution above did fix it, however, I would like to suggest that since liquidsoap is managing the mount points dynamically, it should also send the icecast charset field as utf-8 for a given mount point if the liquidsoap icecast encoding field is set to utf-8. Without this, we have utf-8 encoded strings without utf-8 enabled mount points unless the icecast server config is modified for mounts at a global level as per above.

icecast mount configuration

Thanks for pointing this out. We are in fact sending the charset with our metadata update:

                Cry.update_metadata
                  ~charset:(Charset.to_string out_enc)
                  connection icy_meta

There could still be reasons for this to fail, like an invalid encoding name etc. Do you have more details about the issue so we can try to reproduce? What is the mountpoint configuration and the corresponding liquidsoap code?

danbo commented 5 months ago

I did notice in the code that does send encoding as part of the metadata update, though I'm thinking of the setting the encoding on mount point configuration or in this case dynamic mount point creation (which I don't even know if it's possible), because I was able to reproduce this again.

If it can't be set by liquidsoap by initiating a connection / creating a mountpoint, then perhaps we can update the documentation so users know they may have to update their icecast configuration separately.

--

If the charset is not set on the mount point, ie via the default / global way / xml, I'm not getting the right UTF-8 characters showing in the metadata output on multiple clients. This is also noted by LibraTime. (1) (2)

This is the charset setting I was looking at in the documentation (but of course changing it to UTF-8 in our case):

<mount type="normal">
    <mount-name>/example-complex.ogg</mount-name>
    <username>othersource</username>
    <password>hackmemore</password>
    ...
    <charset>ISO8859-1</charset>
    ...

charset

For non-Ogg streams like MP3, the metadata that is inserted into the stream often has no defined character set. We have traditionally assumed UTF8 as it allows for multiple language sets on the web pages and stream directory, however many source clients for MP3 type streams have assumed Latin1 (ISO 8859-1) or leave it to whatever character set is in use on the source client system.

This character mismatch has been known to cause a problem as the stats engine and stream directory servers want UTF8 so now we assume Latin1 for non-Ogg streams (to handle the common case) but you can specify an alternative character set with this option.

The source clients can also specify a charset= parameter to the metadata update URL if they so wish. ...

However, the current solution:

<mount type="default">
    <charset>UTF-8</charset>
</mount>

uses this feature / global mount config:

type

The type of the mount point (default: “normal”). A mount of type “default” can be used to specify common values for multiple mountpoints.

I'm testing with this instance of icecast which simply substitutes some values based on environment variables on a vanilla config. If I extract their icecast.xml, add that default encoding config, rebuild and rerun the image, the utf-8 chars are shown as expected.

I'm outputting to icecast like this (setting encoding or not makes no difference):

output.icecast(
    id="icecast-test",
    %fdkaac(channels=2, samplerate=wavSampleRate, bitrate=icecastBitrate, afterburner=true, aot=icecastAacProfile, transmux="adts", sbr_mode=true),
    send_icy_metadata=true, 
    #encoding="UTF-8",
    host=icecastHost,
    port=icecastPort, 
    password=icecastPass,
    mount=icecastMount,
    name="test",
    public=false,
    description="test",
    testSource()
  )

Testing with:

toots commented 5 months ago

I did notice in the code that does send encoding as part of the metadata update, though I'm thinking of the setting the encoding on mount point configuration or in this case dynamic mount point creation (which I don't even know if it's possible), because I was able to reproduce this again.

If it can't be set by liquidsoap by initiating a connection / creating a mountpoint, then perhaps we can update the documentation so users know they may have to update their icecast configuration separately.

--

If the charset is not set on the mount point, ie via the default / global way / xml, I'm not getting the right UTF-8 characters showing in the metadata output on multiple clients. This is also noted by LibraTime. (1) (2)

This is the charset setting I was looking at in the documentation (but of course changing it to UTF-8 in our case):

<mount type="normal">
    <mount-name>/example-complex.ogg</mount-name>
    <username>othersource</username>
    <password>hackmemore</password>
    ...
    <charset>ISO8859-1</charset>
    ...

charset

For non-Ogg streams like MP3, the metadata that is inserted into the stream often has no defined character set. We have traditionally assumed UTF8 as it allows for multiple language sets on the web pages and stream directory, however many source clients for MP3 type streams have assumed Latin1 (ISO 8859-1) or leave it to whatever character set is in use on the source client system.

This character mismatch has been known to cause a problem as the stats engine and stream directory servers want UTF8 so now we assume Latin1 for non-Ogg streams (to handle the common case) but you can specify an alternative character set with this option.

The source clients can also specify a charset= parameter to the metadata update URL if they so wish. ...

However, the current solution:

<mount type="default">
    <charset>UTF-8</charset>
</mount>

uses this feature / global mount config:

type

The type of the mount point (default: “normal”). A mount of type “default” can be used to specify common values for multiple mountpoints.

I'm testing with this instance of icecast which simply substitutes some values based on environment variables on a vanilla config. If I extract their icecast.xml, add that default encoding config, rebuild and rerun the image, the utf-8 chars are shown as expected.

I'm outputting to icecast like this (setting encoding or not makes no difference):

output.icecast(
    id="icecast-test",
    %fdkaac(channels=2, samplerate=wavSampleRate, bitrate=icecastBitrate, afterburner=true, aot=icecastAacProfile, transmux="adts", sbr_mode=true),
    send_icy_metadata=true, 
    #encoding="UTF-8",
    host=icecastHost,
    port=icecastPort, 
    password=icecastPass,
    mount=icecastMount,
    name="test",
    public=false,
    description="test",
    testSource()
  )

Testing with:

I was hoping icecast would do the character conversion when sending a metadata update in a different encoding than what the mountpoint is set to?

Anyhow, I'm not sure if there is much we can do:

Given these, I don't know if we can either issue a log warning or do anything better?

toots commented 5 months ago

I'm gonna close this for now, feel free to reopen or follow-up if needed.