inkle / ink

inkle's open source scripting language for writing interactive narrative.
http://www.inklestudios.com/ink
MIT License
4.12k stars 492 forks source link

Compiled ink format requirements and goals #51

Open jamezor opened 8 years ago

jamezor commented 8 years ago

I am currently noodling around with an alternative to the JSON compiled ink format. My current goals are to have small file sizes, faster loading, and no third-party dependencies. However, I'd be keen to know what the original requirements and goals were for the current format so that I can see if my approach could cover those too. For example, the current format is obviously JSON as opposed to a binary format; so the files are easy to parse in alternative languages, are arguably human readable, etc. But if these are not actually requirements of the compiled format then more improvements are possible.

In any case I am still at the experimentation stage so nothing may come of this, but the above info would be handy to document regardless.

joethephish commented 8 years ago

Cool, okay, well here's a summary. We used the JSON format because:

So, broadly speaking, you already got the idea. We too have wondered about a binary representation that's similar to the existing format but more like bytecode. Many of the objects, such as control commands, native functions etc could be represented with single byte codes rather than strings.

We haven't seriously considered it yet because we don't particularly think that the size / readability tradeoff is worth it. We don't really think it's a real problem that needs to be solved. I'd also recommend that you look closely at the existing JSON representation to see what's using up all the space, and ensure that the binary version really would improve significantly on it.

One final important point: Ideally, it would be good if there was just one intermediate format, rather than fragmenting the space. If there's a standard, it has a much better hope of being useful on various runtime implementations (C++, JS, etc). Making ink useable in lots of different environments and platforms is a more important goal than refining the compactness of the compiled story format.

Having said that, if you're really keen, I wouldn't want to stop you! Since I've had similar thoughts to yours, and I'd still be interested to see the outcome :-)

jamezor commented 8 years ago

Sorry it has taken a while to get to this point, but I've finally got my take on a compiled ink format over at https://github.com/jamezor/ink/commit/121dbd4ea5ea49eea8a05d1925fff31c094daee3 if you're interested.

I see you've started on this too in inkc so it may be interesting to compare our approaches, and presumably some of my tests can be reused if nothing else.

In terms of perf vs JSON, using The Intercept as test data: 76.5552ms to convert to JSON 36.8213ms to convert to cink json 158082 chars, cink 109404 bytes 16.0886ms to read from cink 37.5339ms to read from JSON

So it is faster and smaller than JSON in this case, but I don't have any larger test data (and I don't trust the stress generator to generate representative data) so I don't know if these improvements hold for larger stories.

joethephish commented 8 years ago

Cool, nice one! I haven't timed my inkc implementation, but the size savings is roughly the same as what you're finding with cink. (Wow, we're both such inventive namers ;)

Looks like you've done a very thorough job, what with the test cases and so on! Mine was a lot more hacked together, done in the better part of a weekend.

The reason I put mine on hold was that I needed to implement the save state format too, which was slightly more complicated. Ideally, any compiled format would become the standard format, and we'd make JSON integration an optional add-on. It would need to be able to serialise a playthrough's state as well in order to remove the dependency on JSON entirely. Were you thinking of doing that at all?

The only other thing I would say: I decided with inkc not to all out with a full binary format, since I thought that there probably wasn't a huge saving to be made, compared with the slightly-human-readable yet still super terse pseudo bytecode. What I mean by "pseudo bytecode" is that I use single ASCII characters (so, 1-2 bytes in UTF-8?) to denote the start of different Runtime.Objects. For integers and floats, it just prints them out human readable, and terminates in a space. I just tried compiling The Intercept, and it's 113749 bytes, so only 3.9% bigger than cink.

When viewed in a text editor, it mostly looks like garbage, but if you understand a bit of the format (or look at it side by side with a reference) you can see/debug the objects that are being saved...

micabytes commented 8 years ago

Interesting. Not sure I get the core use-case though - is the concern file size or reading/writing speed?

My natural inclination when faced with Json files that are exploding in size (and also what I've basically done for savegame files in Android) is to keep the file readable - which is after all the primary reason to use Json - and just Gzip it. This results in files that are x10+ smaller than the "raw" Json files, and - at least for the size files I've worked with (up to some megabytes), the performance penalty is negligible for local files - i.e., it matters little to users whether a file loads in 100ms or 200ms (for cloud files, performance is actually significantly improved, since most of the loading cost is download time). It also means that it is still easy to debug files, since they can be gunzipped to json (assuming they've not been corrupted).

joethephish commented 8 years ago

@micabytes Yeah, the obvious reason is optimisation, but that wouldn't be enough on its own since I'd got the JSON size down quite far. The kicker for us recently is the issue being discussed in other threads about Newtonsoft.Json.dll causing problems in Unity. Given that our JSON format had become so terse already, it had lost most of its readability anyway. Also, the number of lines of code and complexity to serialise to/from JSON objects isn't so different to a simple custom text based format. Finally, by ditching JSON, we lose our only dependency, which is very attractive. I don't have a problem with keeping JSON around as an option though.

jamezor commented 8 years ago

@joethephish from a quick look the save state is rather tightly coupled to JSON, if I was to take a stab at replacing this I'd attempt to add an interface or three to separate the internals of the runtime from all the JSON business. I'm keen to give it a go, though I notice your inkc branch has diverged somewhat from master (e.g. Branch replaced by Divert, etc). In an effort to avoid a messy merge at the end, would I be best to work off inkc or off master?

joethephish commented 8 years ago

Hrmph, yeah, my bad. I should really have made that change in development or master, apologies. We should probably try to cherry pick that commit into development and ignore the inkc changes. They should be fairly independent. Feel free to attempt that if you like - the actual ink engine changes are fairly simple. Otherwise, I can do it when I have a moment.

Agreed that the save state is tightly coupled to JSON... it probably requires implementing some JSON-like features like dictionaries and arrays, which is a bit awkward, so not sure the best way to go about it. (And probably best to keep the dictionaries as dictionaries to keep the file format as flexible/compatible as possible between versions rather than relying on orders, offsets, etc.)

micabytes commented 8 years ago

Ah, makes sense. I haven't yet implemented the JSON format in jInk, partly because of your mentions about this not being the final form, but also because I couldn't really see the major benefits to it compared to the text version (other than for saving states - but I have my own system for that) - in part because it seems like a weird hybrid that is neither readable nor really compact.

FWIW, it makes sense to me to go for a full-on binary format, in order to maximize the size benefits. Though hopefully one which is not too tightly coupled to the underlying code, as tight coupling would limit usability in other implementations such as jInk.

joethephish commented 8 years ago

One of the primary benefits of the JSON format is (or was meant to be!) that you only have to implement the runtime engine in other languages, not the entire parser. The runtime engine is intended to be much simpler, operating with basic primitives so that, in theory, allow you to take advantage of improvements in the original parser too. Most of the "hard problems" are solved at parse/compile time.

I realise that it's been a bit of a moving target though, what with the compiled format being one of the features to be most heavily developed since we went open source. Having said that, the core runtime engine itself hasn't changed so much, and the runtime formats have been very similar to each other, all being simply serialised representations of the runtime objects.

On 17 May 2016, at 20:27, MicaBytes notifications@github.com wrote:

Ah, makes sense. I haven't yet implemented the JSON format in jInk, partly because of your mentions about this not being the final form, but also because I couldn't really see the major benefits to it compared to the text version (other than for saving states - but I have my own system for that) - in part because it seems like a weird hybrid that is neither readable nor really compact.

FWIW, it makes sense to me to go for a full-on binary format, in order to maximize the size benefits. Though hopefully one which is not too tightly coupled to the underlying code, as tight coupling would limit usability in other implementations such as jInk.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

micabytes commented 8 years ago

Makes sense for implementations that don't need to add new features / special behavior to the code. For those of us that do, forking (or in my case - reimplementing) the parser would seem necessary. Unless you implement this plugin-style interface that I saw you hint at in one post, though I wonder whether that is really practical, given the myriad of ways that the engine can be extended. In jInk I already have three different types of extensions: the new sequence construct (discussed in another post), annotations (basically game presentation instructions hidden in comments), and external objects with dot notation.

Horatio-Blackwood commented 8 years ago

Initially, reading this over the only benefit I saw to changing formats was drop-kicking the dependency. Size/speed don't seem to be real problems - at least for me, but my project is small - so perhaps I'm not the best judge of that for now.

A defined binary format seems to be the most popular choice here at the moment - as long as it is very well documented with its own markdown page in the repository, it would still satisfy the multi-language support requirement @joethephish mentioned above by providing implementers a nice specification to work from.

lhughes41 commented 8 years ago

Speaking as soneone thinking of integrating from an "alternative language" -- namely Python -- I would be very unhappy to see Ink not us json . A standard format like JSON is very convenient and enables supporting libraries. Right now it seems to me you want to broaden your base by making integration easy rather then purusuing questionable efficiencies. If you go the binary route I woukd want to look elsewhere :(. Most unfortunate because I am excited about using Ink. I woukd much rather see development focus on the IDE to support writers. I speak from perspective of a commercial game effort. Anyway my two cents :)

joethephish commented 8 years ago

I'm firmly on the side on not using a binary format.

And I do have an instinct that it's good to use JSON, but I find it hard to argue over a custom text-based format. What advantage is JSON bringing? There's no good reason to parse ink's JSON format outside of the ink engine right now, right now it's not intended to be particularly human readable anyway (although it is debuggable if you know what you're looking at, as a text-based format can be).

lhughes41 commented 8 years ago

Good chance I'm confused but for non Unity folk (like me (Python)) don't I have reason to parse the JSON or should I think of it as all hidden from me behind the Ink API? I'm just acculturating to Ink here, so please educate me. I just had the impression that for example back in the day Banner Saga interacted with the JSON. Is that now obsolete thinking?

On Monday, June 20, 2016, Joseph Humfrey notifications@github.com wrote:

I'm firmly on the side on not using a binary format.

And I do have an instinct that it's good to use JSON, but I find it hard to argue over a custom text-based format. What advantage is JSON bringing? There's no good reason to parse ink's JSON format outside of the ink engine right now, right now it's not intended to be particularly human readable anyway (although it is debuggable if you know what you're looking at, as a text-based format can be).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/inkle/ink/issues/51#issuecomment-227171324, or mute the thread https://github.com/notifications/unsubscribe/ABGsl9LUlkIjoLclCxL-F39dCyNGa7u6ks5qNq1NgaJpZM4IAjLo .

Sent from Gmail Mobile

joethephish commented 8 years ago

Yeah, exactly, you have no reason to need to look at the JSON - it's hidden behind an API.

(The Banner Saga actually used inklewriter rather than ink, and despite the fact that they both use a JSON-based format, they're entirely different!)

lhughes41 commented 8 years ago

Cool! Is there any reason to go the Inklewriter direction anymore? Thanks for the question answering :)

On Monday, June 20, 2016, Joseph Humfrey notifications@github.com wrote:

Yeah, exactly, you have no reason to need to look at the JSON - it's hidden behind an API.

(The Banner Saga actually used inklewriter rather than ink, and despite the fact that they both use a JSON-based format, they're entirely different!)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/inkle/ink/issues/51#issuecomment-227184149, or mute the thread https://github.com/notifications/unsubscribe/ABGsly9AXapq74dV9BeqZpKWk3SmMka-ks5qNrdKgaJpZM4IAjLo .

Sent from Gmail Mobile

lhughes41 commented 8 years ago

Circling back to need (or not) for external user (like me) to access json files directly as opposed to just using the API. I was reading the thread on non Unity ( recall I am Python) and i saw this: "Load the JSON as a text file. e.g. var inkJson = File.ReadAllText("AwesomeGame/yourFile.ink..." This presumably is C. Now I have JSON libraries in Python to do something similar. But if you drop JSON for some binary format then as a non C user I would no longer have a means to do this. So here it seems is a need to preserve JSON if you wish users coming at this from non C backgrounds to partipate. Could one compromise be to have a flag that still allows a JSON version instead of binary? Or maybe I am yet again missing something :). Please advise.