savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
GNU General Public License v2.0
1.39k stars 128 forks source link

Live harbor - Bad song title encoding only harbor #411

Closed desavil closed 6 years ago

desavil commented 7 years ago

Oryginal (broadcasting by SHOUTcast DSP): start ę€ół.śążźćń end (Polish characters and Euro symbol) Send to SHOUTcast 2 via input.harbor/output.shoutcast: start ê€ó³.œ¹¿Ÿæñ end

Filename: start ę€ół.śążźćń end.mp3

Liquidsoap log: 2017/03/23 14:37:56 [input(dot)harbor_5820:3] New metadata chunk ? -- start ê€ó³.œ¹¿Ÿæñ end.

If I load this file in the liquidsoap playlist, there is no problem with encoding only if I broadcasts by harbor.

Screens: Good Bad

toots commented 7 years ago

You should look at this parameter in input.harbor:

 * icy_metadata_charset : string (default: "")
     ICY (shoutcast) metadata charset. Guessed if empty. Default for
     shoutcast is ISO-8859-1. Set to that value if all your clients send
     metadata using this charset and automatic detection is not working for

Try setting it to "latin1" or "utf8"

desavil commented 7 years ago

I've checked this before.

"latin1" - start ê€ó³.œ¹¿Ÿæñ end

2017/03/23 16:58:29 [camomile:3] Failed to convert "start \234\128\243\179.\156\185\191\159\230\241 end": unknown input encoding latin1 2017/03/23 16:58:29 [camomile:3] Failed to convert "song": unknown input encoding latin1 2017/03/23 16:58:29 [camomile:3] Failed to convert "sehes": unknown input encoding latin1 2017/03/23 16:58:29 [camomile:3] Failed to convert "pass": unknown input encoding latin1 2017/03/23 16:58:29 [input(dot)harbor_5820:3] New metadata chunk ? -- start ▒▒.▒▒▒▒▒▒ end.

"utf8" - start ê€ó³.œ¹¿Ÿæñ end

2017/03/23 16:52:51 [camomile:3] Failed to convert "start \234\128\243\179.\156\185\191\159\230\241 end": unknown input encoding utf8 2017/03/23 16:52:51 [camomile:3] Failed to convert "song": unknown input encoding utf8 2017/03/23 16:52:51 [camomile:3] Failed to convert "sehes": unknown input encoding utf8 2017/03/23 16:52:51 [camomile:3] Failed to convert "pass": unknown input encoding utf8 2017/03/23 16:52:51 [input(dot)harbor_5820:3] New metadata chunk ? -- start ▒▒.▒▒▒▒▒▒ end. 2017/03/23 16:52:53 [map_metadata_5822:3] Inserting missing metadata.

desavil commented 6 years ago

Some solution? I use the newest (compiled) liquidsoap and the problem is still there.

Additionally, sometimes in the logs I see something like that (this is not related to the harbor):

2018/08/18 14:20:00 [95551(dot)m3u:3] Prepared "/home/liquid/mp3/1/Andy Black - We Dont Have To Dance.mp3" (RID 5). 2018/08/18 14:20:00 [camomile:3] Failed to convert "We Don\226\128\153t Have To Dance" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibraryUChar.Out_of_range)! 2018/08/18 14:20:00 [camomile:3] Failed to convert "Andy Black - We Don\226\128\153t Have To Dance" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibraryUChar.Out_of_range)!

toots commented 6 years ago

For polish it looks like you should try: ISO-8859-2. Let me know if that works, I might add it to the default.

desavil commented 6 years ago

Not working - :(

2018/08/18 21:46:57 [input(dot)harbor_6446:3] New metadata chunk ? -- start ę�ół.�šż�ćń end. 2018/08/18 21:47:05 [camomile:3] Failed to convert "start \196\153\194\128\195\179\197\130.\194\156\197\161\197\188\194\159\196\135\197\132 end" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibraryUChar.Out_of_range)! 2018/08/18 21:47:05 [camomile:3] Failed to convert "start \196\153\194\128\195\179\197\130.\194\156\197\161\197\188\194\159\196\135\197\132 end" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibraryUChar.Out_of_range)!

Is coding properly changing here?:

admin.cgi?pass=xxx&mode=updinfo&song=start ę€ół.śążźćń end

toots commented 6 years ago

Who's the base client? If that client encodes metadata with the wrong encoding before sending it to liquidsoap then there isn't much liquidsoap can do. Also, you logs still show automatic encoding detection from UTF-8 and ISO-8859-1.

desavil commented 6 years ago

SHOUTcast Source DSP v2.3.5 (latest). Now I've checked and it's probably the problem of the base client. So the matter resolved. Sorry!

And what about this?:

2018/08/18 14:20:00 [95551(dot)m3u:3] Prepared "/home/liquid/mp3/1/Andy Black - We Dont Have To Dance.mp3" (RID 5).
2018/08/18 14:20:00 [camomile:3] Failed to convert "We Don\226\128\153t Have To Dance" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibrary__UChar.Out_of_range)!
2018/08/18 14:20:00 [camomile:3] Failed to convert "Andy Black - We Don\226\128\153t Have To Dance" from auto(UTF-8,ISO-8859-1) to Latin-1 (CamomileLibrary__UChar.Out_of_range)!
toots commented 6 years ago

Hmm. Okay, what is your configuration now?

desavil commented 6 years ago
def update_songtitle(m) =
if m["title"] == "" then
radio = mksafe(audio_to_stereo(playlist(mode="normal","/home/liquid/playlist.m3u")))

radio = map_metadata(update_songtitle,radio)
output.shoutcast(%fdkaac(bitrate=96,samplerate=44100,channels=2),name="Name",genre="Genre",url="http://website.tld",public=true,host="localhost",port=8000,password="pass",on_error=(fun (_) -> 10.),radio)
toots commented 6 years ago

Try adding:


and possibly more of the ISO-8859 ones.

desavil commented 6 years ago

Unfortunately, none of this. I tried from ISO-8859-1 to ISO-8859-10. I even have an English song that has no strange characters in the name, and there is a coding problem with it. Should I send you these MP3 files?

toots commented 6 years ago

Ok, sorry now I realize what's going on. Historically, shoutcast metadata are encoded using the latin1 encoding. However, there's no representation of a lot of characters in latin1 such as the one you're trying from polish language and, so, the conversion fails.

The only option is to force output.shoutcast to send metadata using the utf8 encoding. However, some of your listener clients may not know about that and still expect latin1 strings, resulting in issues displaying metadata. I'm afraid that there's not much room for more here, expect perhaps cleaning out metadata to map to the nearest latin1 character, for instance ę to e and etc.

I've just added a change that allows to change output.shoutcast's metadata encoding. I think that this is the best I can do for you here. Feel free to test, either with the latest code or by adding this on top of your script:

def output.shoutcast(
  ~public=true,~icy_id=1, ~format="",~dj={""},
  ~dumpfile="", ~icy_metadata="guess",
  ~on_connect={()}, ~on_disconnect={()},
  ~on_error=fun(_)->3., e,s) =

  icy_reset = if icy_reset then "1" else "0" end

  headers = [("icy-aim",aim),("icy-irc",irc),

  def map(m) =
    dj = dj()
    if dj != "" then
  s = map_metadata(map,s)

    e, format=format, icy_id=icy_id,
    id=id, headers=headers,
    on_connect=on_connect, on_disconnect=on_disconnect,
    host=host, port=port, user=user, password=password,
    genre=genre, url=url, description="UNUSED",
    public=public, dumpfile=dumpfile,encoding=encoding,
    name=name, protocol="icy",on_error=on_error,

And then:

output.shoutcast(%fdkaac(bitrate=96,samplerate=44100,channels=2),name="Name",genre="Genre",url="http://website.tld",public=true,host="localhost",port=8000,password="pass",encoding="UTF-8",on_error=(fun (_) -> 10.),radio)
desavil commented 6 years ago

There is no error now. It looks like it works. Thanks!

Generally, I have not noticed before that some characters were displaying badly. Currently, they are displayed in the same way, only there is no error in the logs.

This error appeared even with no files that had no other characters than English. For example, the file "Bruno Mars - Thats What I Like.mp3". I think it may have something to do with ID3 tags. Although there are also no special signs there.

EDIT: In file "Bruno Mars - Thats What I Like.mp3" in ID3 (Title) I see that character: (Full title in ID3: That’s What I Like)

toots commented 6 years ago

Glad to hear! UTF8 characters can pop up in surprising ways, here the ' character but you also have unbreakable space, long dash - etc.. :-)