savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
http://liquidsoap.info
GNU General Public License v2.0
1.4k stars 128 forks source link

`json.stringify` can produce invalid `json` if used with `mp3` cover. #1872

Closed vitoyucepi closed 3 years ago

vitoyucepi commented 3 years ago

Describe the bug If I try to use metadata.json.stringify or json.stringify and an mp3 file with cover. Then result string can be an incorrect json.

To Reproduce

  1. Generate cover.png with imagemagick.
      convert -size 128x128 gradient:blue cover.png
  2. Create sample.mp3 with cover.png
      ffmpeg \
        -i sample_input.mp3 -i cover.png \
        -map 0:0 -map 1:0 \
        -c copy \
        -id3v2_version 3 \
        -metadata title="sample" \
        -metadata:s:v title="Album cover" \
        -metadata:s:v comment="Cover (front)" \
        "sample.mp3"
  3. Add metadata handling in script
    
    source = single("sample.mp3")
    def updateMetadata(newMetadata)
    log(metadata.json.stringify(newMetadata))
    end

source.on_track(updateMetadata)



**Expected behavior**
1. `JSON.stringify` should properly escape unicode sequences.
1. `metadata.cover` should be used in `metadata.json.stringify` function.
1. Cover should be in `dataURI` format.

**Version details**
 - OS: `ubuntu:20.04` in docker
 - Version: `2.0.0-rc1`

**Install method**
Deb package from liquidsoap releases at github

**Common issues**
N/A
vitoyucepi commented 3 years ago

@toots I've rechecked unicode escaping and it doesn't work properly. I can still produce incorrect json out of binary file.

vitoyucepi commented 3 years ago

Also why do one want to specify mime parameter? I think that metadata mime is always predefined because of image format. And probably there's no way to embed image with unknown mime.

If the format of data is unknown, then mime should be application/octet-stream.

toots commented 3 years ago

The mime parameter is specific to the coverart deprecated format, which is usually jpeg, hence the default value. I've clarified this: https://github.com/savonet/liquidsoap/commit/da256b0bb5580b17c80a184d55291034f1cd900b

Can you specify how you are producing invalid json output?

vitoyucepi commented 3 years ago
  1. Generate cover.png with imagemagick.
      convert -size 128x128 gradient:blue cover.png
  2. Create sample.mp3 with cover.png
      ffmpeg \
        -i sample_input.mp3 -i cover.png \
        -map 0:0 -map 1:0 \
        -c copy \
        -id3v2_version 3 \
        -metadata title="sample" \
        -metadata:s:v title="Album cover" \
        -metadata:s:v comment="Cover (front)" \
        "sample.mp3"
  3. Add metadata handling in script
      source = single("sample.mp3")
      def metadata.toJSON(localMetadata)
            resultJson = json()
            list.iter((fun (v) -> resultJson.add(fst(v), (snd(v):string))), localMetadata)
            print(json.stringify(compact = true, resultJson))
      end
      def updateMetadata(newMetadata)
            metadata.toJSON(newMetadata)
      end
      source.on_track(updateMetadata)

It's just a sample, because in my current setup

  1. metadata.toJSON will produce json
  2. The json will be assigned as a part of a record.
      resultRecord = {
        metadata = metadata.toJSON(!currentMetadata)
      }
  3. The record in order will be json.stringify into a http.response.
      contentType="application/json; charset=UTF-8"
      response = json.stringify(compact = true, resultRecord)
      http.response(content_type = contentType, data = response ^ "\n")

I'm aware of new metadata.cover. But the current state of the issue is more about json escaping.

toots commented 3 years ago

I'm aware of new metadata.cover. But the current state of the issue is more about json escaping.

Absolutely. Thanks for these details, let me try it locally.

toots commented 3 years ago

Here's what I see locally with v2.0.0:

{"on_air_timestamp":"1631193291.00","encoder":"Lavf58.76.100","status":"playing","kind":"{audio=pcm(stereo),video=none,midi=none}","decoder":"FFMPEG","apic":"\u0000image/png\u0000\u0003Album cover\u0000\uFFFDPNG\r\n\u001A\n\u0000\u0000\u0000\rIHDR\u0000\u0000\u0000\uFFFD\u0000\u0000\u0000\uFFFD\u0010\u0002\u0000\u0000\u0000\u001C�*�\u0000\u0000\u0004gAMA\u0000\u0000\u7FFFFFFFFFFFFC4F\x0b\u61140000 cHRM\u0000\u0000z&\u0000\u0000\u7FFFFFFFFFFFF004\u0000\u0000\u2000000�u0\u0000\u0000�`\u0000:\uFFFD\u0000\u0000\u0017p\u7FFFFFFFFFFFF73AQ<\u0000\u0000\u0000\u0006bKGD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\tX\u1DC000\u0000\x07tIME\x07� \r\t7\uFFFD�\u0000\u0000\u0001\uFFFDIDATx��ݹ\u0011\u0002A\u0010\u0004��8\uFFFD\u001D��\uFFFD\u0015*ӂQ*Z\uFFFD۶}>\u7FFFFFFFFFFFF0E4k����g�\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD\u18B871\u23390C2\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFDvm��\uFFFDx\uFFFD,\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\uFFFD\u0000|\uFFFD'�\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@\uFFFD\u0000H\u0013\u0000i\u0002 M\u0000\uFFFD\t\uFFFD4\u0001\uFFFD&\u0000�@�m����3\u0000\uEEFADC\u32461FC�\u0000\u0000%tEXtdate:create\u00002021-09-09T13:09:55+00:00ބ\u0016\uFFFD\u0000\u0000\u0000%tEXtdate:modify\u00002021-09-09T13:09:55+00:00\uFFFDٮ<\u0000\u0000\u0000\u0000IEND\uFFFDB`\uFFFD","title":"sample","on_air":"2021/09/09 08:14:51","encoding":"Lavf58.76.100","filename":"/tmp/cover/sample.mp3","rid":"0","temporary":"false","tsse":"Lavf58.76.100","initial_uri":"/tmp/cover/sample.mp3"}

This looks okay to me. Invalid utf8 characters have been replace by the conventional rep symbol �

vitoyucepi commented 3 years ago

You can save it to file and then do file -i 1.json and it'll show you application/octet-stream; charset=binary instead of text/plain; charset=utf-8. Or simply feed the output to jq or any online json beautifier.

vitoyucepi commented 3 years ago

I was wrong. Valid json can be application/octet-stream; charset=binary. But I just tested it with js

fetch('http://127.0.0.1:8080/api/v1/info/radio')
    .then(response => response.text())
    .then(text => JSON.parse(text))

And it says that there's an error. I've dumped the result to file and read it with xxd.

00000170: df00 5c75 3030 3030 5c75 3030 3030 5c75  ..\u0000\u0000\u

There's 00 sequence at position 0x171

jq thinks that

Official reference: Json strings

toots commented 3 years ago

Yep, came to the same conclusion about \xnn, I assumed JSON strings were any valid javascript string.. Wrong!

After fixing that, it looked like our UTF8 parsing wasn't correct. I picked up the one from sedlex and it worked much better.

The way to properly test is to dump the json via file.write and read it in node using require("fs").readFile[Sync] and parsing it with JSON.parse. This works:

% node

Welcome to Node.js v14.16.1.
Type ".help" for more information.
> x = fs.readFileSync("/tmp/bla.json")
<Buffer 7b 22 6f 6e 5f 61 69 72 5f 74 69 6d 65 73 74 61 6d 70 22 3a 22 31 36 33 31 32 30 38 39 35 34 2e 30 30 22 2c 22 65 6e 63 6f 64 65 72 22 3a 22 4c 61 76 ... 2522 more bytes>
> JSON.parse(x);;
{
  on_air_timestamp: '1631208954.00',
  encoder: 'Lavf58.76.100',
  status: 'playing',
  kind: '{audio=pcm(stereo),video=none,midi=none}',
  decoder: 'FFMPEG',
  apic: '\x00image/png\x00\x03Album cover\x00�PNG\r\n' +
    '\x1A\n' +
    "\x00\x00\x00\rIHDR\x00\x00\x00�\x00\x00\x00�\x10\x02\x00\x00\x00\x1C��\x00\x00\x04gAMA\x00\x00��\x0B�a\x05\x00\x00\x00 cHRM\x00\x00z&\x00\x00��\x00\x00�\x00\x00\x00��u0\x00\x00�\x00:�\x00\x00\x17p��Q<\x00\x00\x00\x06bKGD������\tX�\x00\x07tIME\x07�\r\t7��\x00\x00\x01�IDATx�ݹ\x11\x02A\x10\x04��8�\x1D��\x15*ӂQ*Z�۶}>��k��g�\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01���\f8��&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�vm��x�,\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i�\x00|�'��&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@�\x00H\x13\x00i\x02 M\x00�\t�4\x01�&\x00�@���3\x00��/�\tƇ|�\x00\x00%tEXtdate:create\x002021-09-09T13:09:55+00:00ބ\x16�\x00\x00\x00%tEXtdate:modify\x002021-09-09T13:09:55+00:00�ٮ<\x00\x00\x00\x00IEND�B`�",
  title: 'sample',
  on_air: '2021/09/09 12:35:54',
  encoding: 'Lavf58.76.100',
  filename: '/tmp/cover/sample.mp3',
  rid: '0',
  temporary: 'false',
  tsse: 'Lavf58.76.100',
  initial_uri: '/tmp/cover/sample.mp3'
}

I have also confirmed that we can do file.contents on our end and json.parse. (not to self: improve json.parse in 2.1.0). Also, bonus point, the apic string can still be parsed. PNG data is invalid, though, which is not a surprise.

j = file.contents("/tmp/bla.json")
j = json.parse(default=[("","")], j)
cover = string.apic.parse(j["apic"])
cover : string.{description : string, picture_type : int, mime : string} = "<data>".{description = "Album cover", picture_type = 3, mime = "image/png"}

Lastly, I wanted to make sure that our default string.quote always returns a JSON-compatible string so I have added \/ to our supported escaping patterns.

Thanks for following-up!