Open fab13n opened 10 years ago
I'd say that __tostring
results should be put into quotes — they are, indeed, strings.
Maybe __serialize
would help? (Didn't look into it closely.)
@agladysh Adding this produces the same result:
token_mt.__serialize=token_mt.__tostring
print(s.serialize(stream,{ }))
Besides, it could be debatable whether this is a bug for __tostring
, but the same issue exists with custom printers.
@fab13n, I agree, it may make sense to handle __serialize
differently. I think for consistency, __tostring
should still be quoted.
Is it be sufficient for your purposes if __serialize
outputs its result as is?
I think tostring
results shouldn't be quoted by Serpent itself. Otherwise, we can't tell the difference between a generated output and the corresponding literal string. In the example above I couldn't tell the difference between the outputs corresponding to the two following lines:
stream1 = { token("Keyword", "local"), token("Id", "x") }
stream2 = { [[token("Keyword", "local")]], [[token("Id", "x")]] }
assert(s.line(stream1)==s.line(stream2))
Sure, the version with mandatory quotes is valid Lua, but a valid Lua that's semantically different from the original one. It's arguably more treacherous, rather than safer. If users want to add quotes, they can add them in their __tostring()
implementation. But the point of a pretty-printer is, well, to print something pretty, or at least readable. Compilability is mandatory for serialization, not for display.
As for using serialization instead of pretty-printing when one intends to pretty-print in a prettier way, I find this rather confusing :-) Besides, I don't even know whether I can use readability-improving options, such as indentation, with s.serialize()
.
Yes, I do see this point and has struggled with the same considerations in the beginning. The main goal was to make the output consistent between serialization and pretty-printing as it's convenient in many situations take the "printed" output and paste it into a script. For example, when you use a console in the IDE to execute commands, is the result serialization or pretty-printing? It's probably mostly the latter, but I've seen on several occasions people taking the output from the console and pasting it into their code.
stream1 = { token("Keyword", "local"), token("Id", "x") }
stream2 = { [[token("Keyword", "local")]], [[token("Id", "x")]] }
assert(s.line(stream1)==s.line(stream2))
Right, but these are different values and I wanted them to look different. Similar to that, there is ambiguity with nil
vs "nil"
, true
vs. "true"
and so on (which doesn't exist with the current output).
I can add a parameter to disable quoting of string values and would prefer keep the current option as the default.
On Mon, Dec 30, 2013 at 7:01 PM, Paul Kulchenko notifications@github.comwrote:
assert(s.line(stream1)==s.line(stream2))
Right, but these are different values and I wanted them to look different.
My point is, they are identical with the current implementation, and I want them to look different!
My point is, they are identical with the current implementation, and I want them to look different!
I understand, but they only look the same because your token()
call returns its own representation.
There is still a difference between:
function token_mt :__tostring() return "true" end
-- and
function token_mt :__tostring() return true end
I pushed a change (in rawstring
branch) that adds rawstring
parameter:
print(s.block(stream, {rawstring = true}))
-- generates
{
token(Keyword, local),
token(Id, x)
} --[[table: 00C8C248]]
You can make it a default for all methods, so that your users don't need to specify it.
Will this work?
I'm not sure if this is a correct approach. I think this should be configurable on per-object level. It makes little sense to me to configure a thing like this once per serialization call.
I thought so as well; for example, adding something like __raw
attribute to the metatable would signal that the result string should not be quoted (and we'd not need the serialization parameter), but I'd prefer not to clash with (possible) Lua values, like __mode
. Maybe _raw
or _rawstring
?
For example, when you use a console in the IDE to execute commands, is the result serialization or pretty-printing?
There is yet another use-case for pretty-printing while serializing — human-readable config files. This is a primary reason why in Lua Núcleo we have tpretty
as well as tserialize
(along with debug visualization of complex tables, of course).
So, we have at least three reasons to convert a Lua object to string:
Human-readable serialization and visualization should be highly configurable to conform to usage modes (single-line logging, multi-line config files) and to coding guidelines (indentation rules, line width etc.)
Maybe I'm missing something (didn't study serpent design too close yet, sorry), but I'm not sure why __tostring
and __serialize
are supported by default at all.
I'd start with __serpent
metamethod, which will have (read-only) access to the configuration and will allow object author emit custom serialized data — to be pasted literally to the resulting string.
Then I'd consider enabling __tostring
support for human-readable visualization by default — but only there.
To me __serialize
metamethod name looks too generic — there is no well-established protocol for it, as far as I'm aware. But maybe it will make sense to support it as a generic serialization fallback for serialization (and only there), accepting no parameters and returning a string to, again, be pasted literally to the output.
Furthermore, all this metamethod handling is a sugar that will slow down the basic implementation. Why not leave it to user to sort out in a custom serialization callback function? (Well, OK, and maybe provide some useful common implementations in an auxiliary module...)
Maybe
_raw
or_rawstring
?
Please don't use such generic names to control module-specific behaviour.
If you really want a flag, use __serpent_raw
or something.
BTW, while we're considering __serpent
(or whatever) custom API, there is a common usage pattern — at least in our code.
When I'm serializing to an output stream (say, stdout
), I prefer to write there directly to be easier on GC. Compare Núcleo's tstr
and tstr_cat
.
It makes little sense to me to configure a thing like this once per serialization call. vs. This is a primary reason why in Lua Núcleo we have
tpretty
as well astserialize
...
This is the same thing that serpent does with different methods and options it provides.
Using per-object configuration allows one to mix objects with different configurations in one serialization call, although I'm not sure how useful it is.
Human-readable serialization and visualization should be highly configurable to conform to usage modes (single-line logging, multi-line config files) and to coding guidelines (indentation rules, line width etc.)
Yes, serpent also supports all these combinations except limiting line width.
When I'm serializing to an output stream (say, stdout), I prefer to write there directly to be easier on GC. Compare Núcleo's tstr and tstr_cat.
This is a good point; it should be possible to write a wrapper around serpent that will allow "pulling" of the next token, allowing to do whatever desired with it and limiting memory use to the token being returned.
@fab13n, if you prefer not to add the rawstring
option to the serialization call, then I can go back to the original proposal and to use __serialize
call to provide literal serialization. In the current logic __serialize
metamethod (if present) takes precedence over __tostring
. I don't see a problem with rawstring
option as it can be made a default, but @agladysh is right that in this case it will be applied to all __tostring/__serialize
results.
Yes, serpent also supports all these combinations except limiting line width.
Is this hard to implement? tpretty
even places short (sub-)entries on a single line, and uses multi-line serialization for longer ones (the implementation is rather messy though).
Is this hard to implement?
No, but it hasn't been a popular feature so far. Also, the original goal for Serpent was to provide "good enough" implementation (in the sense of "good" correctness and "enough" features for majority of users), while still staying small. It's not so small anymore, but I'm hesitant to add features "just in case". I usually look for specific use cases or user requests.
BTW, there is yet another use-case "axis" for human-readable part — fast and good-enough vs. slow and really nice-looking.
When I'm writing data to a log file, I need it to be human-readable, but I need to write it really fast, even at expense of certain features (tstr
).
When I'm generating code (like a configuration file), I need to do it prettily and I do not care for the speed (as much): tpretty
.
Interestingly enough, tserialize
is actually almost useless — loading back Lua code is too slow when compared with other machine-readable solutions (like, say, luatexts
or msgpack
).
I usually look for specific use cases or user requests.
I'm considering dropping tpretty
in favor of serpent
someday. Lack of line limit is a blocker, since we need to generate code that is compliant to our coding guidelines. So, consider this a +1 for that feature. :)
Lack of line limit is a blocker, since we need to generate code that is compliant to our coding guidelines. So, consider this a +1 for that feature. :)
That's good as a feature request ;).
Good point on fast vs. pretty, although it's difficult to say ahead of time what features can be skipped to make things fast(er).
Using per-object configuration allows one to mix objects with different configurations in one serialization call, although I'm not sure how useful it is.
Well, to me __tostring
in serialization/visualization context seems to be useful only as a clutch for a quick-and-dirty visualization. Given that, I would say that the per-call configuration is meaningless.
I see several major use-cases (examples in pseudocode, I've actually needed everything listed here at one time or another):
Mutate object on serialization.
__serpent = function(self)
-- Rest of the fields are ephemeral
return { x = self.x, y = self.y }
end
Provide a constructor function for serialization.
__serpent = function(self, serpent)
local data = tclone(self)
data.tag, data.id = nil, nil
return self.tag, " ", serpent.short_string(self.id), serpent.newline, data
end
--- For self = { tag = "foo:bar", id = "baz", quo = 42 }:
---
--> foo:bar "baz"
--> {
--> quo = 42;
--> }
Control formatting for human-readable serialization.
__serpent = serpent:format_fields { type = serpent.long_string }
--- To be put on `data` above to get:
--> foo:bar "baz"
--> {
--> type = [[mystring]];
--> quo = 42;
--> }
Provide custom visualization.
__serpent = function(self, serpent)
if not self.tag then
return self
end
return "`", self.tag, " ", serpent:comma_list(self)
end
Not a single use-case that I see now requires non-raw pasting.
What am I missing?
Furthermore, __tostring
is a potential source of bugs here — it is so generic. If I'm using serpent
for serialization, I must add __serialize
to every object that might have __tostring
in its metatable — even on a object from foreign module. And there is no saying what metatamethods and for what reason foreign module objects would have.
(I also updated the post above a bit.)
Item 3 above also may include custom field order for an object, BTW, — useful for DSLs, where field order may be determined by the purpose of the field (say, first common fields, then specialized for this specific construct), thus be dependent on tag
field value...
(Edited that big post with use-cases again.)
...And here is a problem with using a metamethod instead of a callback in serialization. Generally, you want to save the object in such way that, when loaded back, it will be able to save itself again.
This means that this doesn't cut the mustard:
__serpent = function(self)
return { x = self.x, y = self.y }
end
You always need a constructor that will set a metatable:
__serpent = function(self)
return "make_foo", " ", { x = self.x, y = self.y }
end
This is not a problem for visualization — you don't need to load anything back. But for serialization you're better off using a callback function. This way you would not need to set any metatables on load, and can just emit good old table literals.
Well, to me __tostring in serialization/visualization context seems to be useful only as a clutch for a quick-and-dirty visualization. Given that, I would say that the per-call configuration is meaningless.
It's actually something that many users find useful/helpful. Some use it for their own values that already provide __tostring (pkulchenko/ZeroBraneStudio#77) and some use it to serialize and print their userdata (for example, Marmalade Quick).
I see several major use-cases (examples in pseudocode, I've actually needed everything listed here at one time or another):
That's a lot to review and digest; thank you for the summary. What's __serpent
in these examples?
__serpent
is a fancier __serialize
metamethod — I used a different name here so we'll not be bound by an existing contract while discussing the matter.
It's actually something that many users find useful/helpful. Some use it for their own values that already provide __tostring (pkulchenko/ZeroBraneStudio#77) and some use it to serialize and print their userdata (for example, Marmalade Quick).
The usage of the __tostring
in general is not useful for serialization. Look at luasocket
, for example. You can't load that back.
True, in certain closed ecosystems it is possible to institute a stricter contract. But in general, IMO, a serialization code should not look at __tostring
at all — unless user explicitly configures it to do that.
Visualization is debatable.
But, if I were writing a debugger, I would, first of all, provide my own callback, so users could specialize their code. Besides, what if __tostring
for a given object has side-effects, for example? It is not forbidden by the general __tostring
contract. So, if mobdebug uses __tostring
for value visualization, IMHO, this is a dangerous misfeature that can cost a careless and unlucky user many hours of pain.
serpent is a fancier serialize metamethod — I used a different name here so we'll not be bound by an existing contract.
That's what I suspected, but I thought you were arguing against adding metamethods to all the objects: "If I'm using serpent for serialization, I must add serialize to every object that might have tostring in its metatable — even on a object from foreign module."
It seems like there are three main options for implementing that:
__tostring
; this should be good enough for 90% of the users, maybe more. The current issue (at least for Fabien) is that the output of __tostring
is not literal.__serpent
(or existing __serialize
). This should cover all the special cases (and should use literal output), but needs to be installed for every object that requires that treatment.custom
option, but it's only triggered for "plain" tables.(BTW, please don't let my ramblings to distract you from giving Fabien a working solution for his immediate problem :-) We can discuss theory for ever, but it is practice that is important.)
But, if I were writing a debugger, I would, first of all, provide my own callback, so users could specialize their code. > Besides, what if tostring for a given object has side-effects, for example? It is not forbidden by the general tostring contract. So, if mobdebug uses __tostring for value visualization, IMHO, this is a dangerous misfeature that can cost a careless and unlucky user many hours of pain.
In what way? This is not too different from using print(foo)
.
That's what I suspected, but I thought you were arguing against adding metamethods to all the objects: "If I'm using serpent for serialization, I must add serialize to every object that might have tostring in its metatable — even on a object from foreign module."
This was an argument against __tostring
on serialization.
BTW, please don't let my ramblings to distract you from giving Fabien a working solution for his immediate problem :-) We can discuss theory for ever, but it is practice that is important.
;). I'll give Fabien a chance to respond to all the suggestions we discussed...
If we're talking about serpent-specific callbacks, having a 'custom' option which doesn't add quotes around my stuff if I didn't ask for them is good enough to me.
I still believe that forcing quotes around tostring results is wrong, but I can live with other peole being wrong :) On Dec 30, 2013 11:31 PM, "Paul Kulchenko" notifications@github.com wrote:
serpent is a fancier serialize metamethod — I used a different name here so we'll not be bound by an existing contract.
That's what I suspected, but I thought you were arguing against adding metamethods to all the objects: "If I'm using serpent for serialization, I must add serialize to every object that might have tostring in its metatable — even on a object from foreign module."
It seems like there are three main options for implementing that:
- Using tostring; this should be good enough for 90% of the users, maybe more. The current issue (at least for Fabien) is that the output of tostring is not literal.
- Using special metamethod, like serpent (or existing serialize). This should cover all the special cases (and should use literal output), but needs to be installed for every object that requires that treatment.
- Using special callback, probably based on a type. This allows the callback to be specified only once (instead of attaching the metamethod) and can be hidden as one of the default options (in Fabien's case). This is similar to the current custom option, but it's only triggered for "plain" tables.
— Reply to this email directly or view it on GitHubhttps://github.com/pkulchenko/serpent/issues/10#issuecomment-31373122 .
If we're talking about serpent-specific callbacks, having a 'custom' option which doesn't add quotes around my stuff if I didn't ask for them is good enough to me.
So, what I have in the rawstring branch would then work for you?
print(s.block(stream, {rawstring = true}))
-- generates
{
token(Keyword, local),
token(Id, x)
} --[[table: 00C8C248]]
But for serialization you're better off using a callback function. This way you would not need to set any metatables on load, and can just emit good old table literals.
Come to think about it, there is a catch here. Downside of a callback function is that it is centralized — and forces tight code coupling. If your objects come from different modules, you have to be aware of each one of them in the callback.
Now you do dispatch by metamethods, which is nice when you have a metatable. And you often do, actually, when serializing a complex object, — despite what I said earlier. So, it is probably better to keep supporting metamethod dispatch.
The argument against metamethods comes from the use-case when I don't have a "natural" metatable on the object and don't want to add it — DSL-ization of the data (see use-cases 1-3 above).
I do my dispatch by the tag
field, and the serialization code should do that too.
For that use-case a centralized callback function is OK, I can just skip metamethods altogether.
However, in general my object factories do not use metatables too, unless it is absolutely necessarily, — in plain Lua 5.1 it is usually faster to manually fill a table with methods than to call setmetatable
.
So, for example, what if I want to put my serialization code to the :save()
method instead of __serialize
metamethod? And what if I want to change the dispatch behavior for "my" objects only?
This is getting a bit contrived though...
So, if mobdebug uses
__tostring
for value visualization, IMHO, this is a dangerous misfeature that can cost a careless and unlucky user many hours of pain.In what way? This is not too different from using
print(foo)
.
Don't know about others, but I wouldn't intentionally call print()
on a variable that I would suspect that it holds a non-trivial value.
Anyway, a good debugger, IMHO, does not have any right to affect the value that is being inspected in any way without explicit user command.
What if I'm debugging __tostring
itself?
__tostring = function() _G.table = nil end;
__tostring = function() do_segmentation_fault() end;
__tostring = function() return 0/0 end; -- Assuming MSVC and bad coprocessor flags.
__tostring = function() eat_all_memory() end;
__tostring = function() error("oops!") end;
To be even more realistic, what if __tostring
has an internal cache that you would populate or invalidate by your call?
A debugger does not have a right to assume that __tostring
does not have any side effects. Usually it does not. But it is not enough.
My use-case is handled adequately by custom
. I still can't follow your logic behind not making "rawstring" the default behavior, but that's not critical.
One design consideration, though: __tostring
is a pretty-much standard metamethod, since it's used by tostring()
which is a standard function. But adding more meta-method, such as __serialize
and even more so __serpent
, should be done with some restraint: it adds some coupling between serpent and libraries which could have been left independent. I'd even say it encourages monkey-patching of metatables.
@fab13n sure, __serpent is tight coupling. But I do not see how it can be avoided — unless you happen to invent a generic API somehow.
Serialization of complex business-logic object is a complicated and messy business.
If you want it to be performance-effective, you must use advanced features, which almost always are specific to the serialization library.
Look at how many use cases we listed above. Most need their own API.
Add to the mix low level data formats (i.e. JSON instead of Lua), format versioning and support for older versions and all other stuff — and you'll see why I say that __serialize
is a bad name, too generic for for a library that does a specific and limited kind of serialization.
And __tostring
, as I elaborated above, IMO, is not suitable for serialization at all.
I'm evaluating the feasibility of using Serpent as Metalua's pretty-printer. I'm stuck with the fact that the result of __tostring metamethods are put into quotes. There's a similar issue with custom printers. For instance: