mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.25k stars 2.9k forks source link

Lua: utils.format_json throws an error instead of returning nil, error #10220

Open CogentRedTester opened 2 years ago

CogentRedTester commented 2 years ago

Important Information

mpv version:

mpv 0.34.0-300-g9022b1b51d Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
 built on Sun May 22 12:11:51 2022
FFmpeg library versions:
   libavutil       57.24.101
   libavcodec      59.28.100
   libavformat     59.24.100
   libswscale      6.6.100
   libavfilter     8.38.100
   libswresample   4.6.100
FFmpeg version: git-2022-05-21-27cffd16

Windows 10 Shinchiro Build

Reproduction steps

Run mpv with the following script:

local utils = require "mp.utils"
print(utils.format_json({ [true] = 1 }))

A boolean is not a valid JSON key, so this will throw an error.

Expected behavior

According to the mpv manual:

Format the given Lua table (or value) as a JSON string and return it. On error, returns nil, error. (Errors usually only happen on value types incompatible with JSON.)

Which means something along the lines of this should be printed: nil "key must be a string, but got boolean"

Actual behavior

mpv throws an error and the script outright crashes:

image

I can't be sure of this, but I've used format_json in the past and I don't remember this happening then. It's possible I just never gave it an invalid json string, but I find that surprising, so this may have broken sometime in the last couple of years.

Log file

log.txt

avih commented 2 years ago

Well, the issue is not specific to format_json but rather to the conversion of a lua table into mpv internal representation (mpv node), which also happens for instance with this:

mp.command_native({[true] = 1})

or with set_property_native etc, because all those take an arbitrary native lua value (including a table) and convert it into an mpv structure - which has this conversion issue (because the internal mpv structure has similar limits as json, where a key can only be a string).

Specifically the error is generated here https://github.com/mpv-player/mpv/blob/master/player/lua.c#L763

So one approach woud be to "fix" the docs (states that converting a non-standard lua table into an mpv value can throw an error).

To handle it programatically (the case where makenode throws) is not impossible, but the additional complexity (and effort) might not be worth the value. This could involve makenode returning a special value or string which means "could not convert", and use that in every function which makes an mpv node from a lua value.

Another approach might be to wrap these functions which take a lua table inside try/catch, and on error return it as nil,error (eventhough there could also be errors unrelated to the conversion, e.g. on OOM or others).

Another approach is maybe to try and convert the key to a string (without erroring out). Not sure whether or not that would be desirable.

No decision for now other than acknowledging the report.

CogentRedTester commented 2 years ago

Hmm, I definitely think that this should either be fixed or properly documented. I do think that format_json specifically should not throw errors like this given how likely it is for an invalid table to be entered, unlike with set_property_native for example which generally have expected input formats, but I understand that it may add more complexity than you're comfortable with.

For the time being I have added a wrapper function on the Lua side to catch the errors:

function format_json_safe(t)
    local success, result, err = pcall(utils.format_json, t)
    if success then return result, err
    else return nil, result end
end
avih commented 2 years ago

For the time being I have added a wrapper function on the Lua side to catch the errors:

Yes, that's the wrapper approach I suggested, but keep in mind that theoretically there could be also errors other than "not string key".

format_json specifically should not throw errors like this given how likely it is for an invalid table to be entered, unlike with set_property_native for example which generally have expected input formats

How's that? JSON has an expoected input format too. What makes it more likely to try to convert to JSON a table which can't convert to JSON than to use invalid table in command_native or set_property_native? What's your use case for format_json?

Also, some APIs do throw when it's an explicit user error, like when using an incorrect type with mp.ovserve_property and others. So it's not like we have a "never throw" policy. Nevertheless, specifically format_json documents that it doesn't throw in case of invalid table, so at the very least the docs are incorrect in this case.

CogentRedTester commented 2 years ago

Yes, that's the wrapper approach I suggested, but keep in mind that theoretically there could be also errors other than "not string key".

Yes, I'm working on a format_mpv_node function that will recursively check if a table is correctly formatted to avoid the need of the protected function call. A little inefficient, but I won't be calling format_json often, and it means any other errors that are meant to throw will go through.

How's that? JSON has an expoected input format too. What makes it more likely to try to convert to JSON a table which can't convert to JSON than to use invalid table in command_native or set_property_native? What's your use case for format_json?

This may be a matter of personal opinion, but there are a number of Lua table structures where it is not immediately obvious that it is invalid. For example, one thing that is sometimes done in both Lua and Javascript is for additional information about an array to be stored in some key value. Javascript does this with the length value, and Lua does this when using table.pack. On Javascript using JSON.stringify() just ignores these additional values, but utils.format_json() throws an error. I myself often store additional array metadata in keys like that. Of course it makes sense that this input is invalid given that, unlike Javascript, Lua uses the same data structure for both arrays and maps, but as you said the docs imply that this sort of mistake won't cause a throw.

Another issue is holes in an array; if any entry in the middle of a Lua array is set to nil then mpv will assume that the table is a map and then immediately crash because the table contains numeric keys. If an array is being filled programmatically then it is possible that this can happen at unexpected times.

Again, I'm not saying that these inputs shouldn't be considered invalid, just that, as you said, the current documentation implies that these sorts of mistakes can't cause the script to crash, which gives the authors a false sense of security.

What's your use case for format_json?

I don't intend to imply that my use case is normal, but my most common use of 'format_json is to transfer arbitrary lua tables between scripts over script messages. For example my script file-browser will send the contents of a directory to another script that requests it by formatting the table as json data. In this case, since the purpose of the table isn't just to pass to an mpv option or to save a compliant JSON file, it is much more likely that invalid formatting can sneak in. This is probably why I've never come across this issue with any of the other functions despite using them significantly more often than format_json.

avih commented 2 years ago

For example, one thing that is sometimes done in both Lua and Javascript is for additional information about an array to be stored in some key value. Javascript does this with the length

That's standard in JS - an array has a length property, which can be longer than the number of elements set in it, and an array can have holes which would translate correctly into an mpv node. While you can't see how it translates (because we didn't add utils.format_json in JS), it does take the holes correctly e.g. in set_property_native.

On Javascript using JSON.stringify() just ignores these additional values

That's standardized behavior of how to convert a JS value to JSON (and it's performed by the JS VM - not by mpv).

If you think there's some issue here then do tell please.

In lua, howevr, the definition is much more vague, and many times code which operates on "arrays" actually works on the sequence 1..n where element n+1 is missing/nil.

There's also no standard definition of how a lua value should be converted to JSON.

Another issue is holes in an array; if any entry in the middle of a Lua array is set to nil then mpv will assume that the table is a map and then immediately crash because the table contains numeric keys

Yup. Not crash but intentionaly throw an error that a map key is not a strring. The mpv algorithm here is that if there are more elements at the table than "array indices" (1.. some n), then consider it a map. This works fine if it's exactly-array or exactly-map, but then if it did already have array indices (numeric) then it's guaranteed to be a non-string key and it errors. Arguably not an amazing algorithm.

In general, not all lua tables have a 1:1 translation to an mpv node (or JSON for this matter), and sometimes there is but it's ambiguous (empty table into emty array or empty map?).

For mpv needs, mpv node is enough, and mpv doesn't have a way to store or use more complex structures which a lua table can hold, etc.

Currently something will go wrong if a lua table is not exactly either a lua array (sequence of numeric keys 1..n and nothing more), or a map with only string keys. This means that some mpv-nodes can't be created (arrays with holes and few more).

Also, if mpv returns an empty lua table then it adds a tag member which indicates whether it's an array or map (which is being considered if it later needs to convert this table into an mpv node).

So the question here is what to do about these, and the 4 suggestions i posted earlier still stand. Do you have a preference between those? or maybe something else?

my most common use of 'format_json is to transfer arbitrary lua tables between scripts over script messages

Right, it is a good use case, but obviously JSON can't encode losslessly arbitrary lua (or JS) tables, for many reasons even beyond the mixed array/map one and non-string keys (elements which refer to functions and other internal types, cycles, etc).

So regardless if it throws or returns nil,error (or converts in some lossy way), it just can't convert perfectly.

There can be lossless custom encodings to some lua tables which JSON still can't hold (mixed array/map tables), but such tables are probably not very portable (js can't hold them, and neither can mpv-node). FWIW, there are also mpv-node which lua/js can't hold (map with duplicate keys, but luckily mpv tries to avoid such internal nodes).

So basically there are two subjects here:

  1. format_json doesn't behave as documented.
  2. can we generally or specifically improve the cases where a translation either way between mpv node and a lua table is lossy (behavior, docs, etc).

I think that improving the docs could be enough here, because if there's no 1:1 translation then ultimately there's no perfect behavior, and we need to decide between several suboptimal ones.