Open remicollet opened 6 years ago
There is a serialization support for lexertl https://github.com/BenHanson/lexertl14/blob/master/lexertl/serialise.hpp, however it depends on Boost and likely has some specific format. For the parser there's ATM no such code, so far I could see. I think parsertl could support serialization with Boost, too. This method would require to use Boost for serializing the PHP internal stuff, too, which might be not that suitable for PHP users.
I currently work on the documentation and lets see yet how the current PHP API suffices. Therefore I was rather deferring the further internal integration, until the API got stable. I think JSON serialization could be made possible, when the PHP API is established and at least in beta. Perhaps should mark both as not serializable for now and keep the issue open. Of course the goal to have things serialized is a worthwhile one.
Thanks.
If I can get a C++20/constexpr version of the libraries going, then a parser to read the serialisation format can be built at compile time, which would then justify loading the saved data instead of just building it again.
@BenHanson i think there's a need to clarify the matter a bit. PHP uses an ASCII
based serialization format. The point here is of course about the PHP own items (variables, objects, constants, etc.) versus lexertl
/parsertl
objects that Parle\{Parser\Lexer}
instances carry inside.
I guess the question with regard to lexertl
/parsertl
would be not about raising the C++ version requirement, but whether a serialization can be a feature of these libraries themselves. Perhaps it could be possible to serialize/unserialize using some other non binary format? Perhaps there could be some other option? A portable binary format could be an option, too. Given the PHP own serialization format is not binary, it is portable, so that's a good point, too.
With serialization supported, the opportunities to save/share/exchange parsers and lexers will of course improve. This would seem an advantage to lexertl
/parsertl
as well as any consumers. PHP related, even with a binary parser, say embedding a base64 encoded blob into the PHP serialization format is thinkable. Without having to define any C++/PHP code, one would just unserialize
and able to operate on that.
Thanks
@weltling In fact I could just output the table numbers as ASCII and use C++ streams to stream them back in etc.
In the past it always seemed kind of pointless to me, but it's not hard to do so why not if people want it?
Yep, that could be an approach. To keep in mind is also the security component, as the serialized string can be ponentially manipulated.
Thansk
I have written the C++ for serialisation and will publish it soon. If security is a concern it may be better to store the serialised text directly in the application rather than as an external file?
The idea with the serialization is exactly about having the data saved outside the app. Like for example
$f = "/path/to/dump.txt";
if (file_exists($f)) {
$paser = Parser::from(file_get_contents($f));
} else {
$parser = new Parser;
.........
}
file_put_contents($f, serialize($parser));
There's a big red block of test warning about the trusted sources here:
https://www.php.net/manual/en/function.unserialize.php
but there's no real way to control what goes around. Say same data can be passed over network, saved in DB, whatsoever. So just mentioning as it's a usual practice to care the data read in is actually valid and won't say crash the app. It would then concern both the C++ lib and the PHP side, too.
Thanks
As the state machine is just a bunch of numbers, JSON won't really help here.
As mentioned in the link you provided, adding a hash may help. I can probably use a hash function from the standard library, although that will lock a serialisation to a particular compiler.
Yep, JSON is of no use, but i guess a hash would be not much of that as well. Once the list of numbers is outside, the hash can be manipulated same way the numbers are. The number list itself is a good enough approach in first place, IMO, it allows for a plain operation on the underlying data.
It would be perhaps some sanity check that would do the job to ensure the numbers are valid? At this point I'm not very familiar with the parsertl
and lexertl
internals to suggest something more concrete :/ For example, like checking if some imported ID refers to an existing item, etc.?
Thanks
For now
Result in
PHP Fatal error: Uncaught Parle\LexerException: Lexer state machine is not ready in /tmp/foo.php:41
Indeed, serialization result in something obviously wrong:
"O:11:"Parle\Lexer":0:{}"
If too much work to support, perhaps better to declare as not serializable ? Perhaps a "saveToJson" and "loadFromJson" methods could be better (you know... serialization...)
Perhaps, just a bad idea ;)