Closed Vladimir37 closed 1 year ago
Yes, I absolutely agree that reverse conversion (often called serialization) would be a huge boon. Unfortunately the conversion from the data file to an object is lossy, meaning that given certain objects, it's ambiguous what the correct serialization should be.
Given:
{
"foo": [1, 2]
}
is the correct serialization:
foo=1
foo=2
or
foo = { 1 2 }
or even
foo = { 1.000 2.0 }
jomini currently doesn't have a strong enough vocabulary to roundtrip deserialize and then serialize without ambiguity.
It seems to me that this problem can be solved if Jomini will display the serialized data in a format like this:
{
"type": <type>,
"body": <body>
}
In this way,
{
"foo": {
"type": "int_array",
"body": [1, 2]
}
}
will be deserialized to
foo = { 1 2 }
If type
field is float_array
:
foo = { 1.0 2.0 }
If type
field is chain
:
foo=1
foo=2
Using the "type" field would help eliminate ambiguity, as it seems to me.
You're absolutely right that there are ways to disambiguate the types (and your idea is a good one). The one downside is that instead of accessing like foo.bar
, one would have to do foo.body.bar.body
This problem can be solved if Jomini will have two methods. For example:
jomini.parse
- Currently existing method. The data is easy to read, but they can not be serialized.jomini.deserialization
- Derealization using body/type objects.Thus, the data for easy viewing and data for use with subsequent serialization will be separated.
Ideally there'd only be one method to ease differences in parsing. There may be a way to create a class where we keep all properties on the class + a jomini_type()
method that is used in a save()
method to disambiguate.
But this may be wishful thinking and creating two methods may be more practical in the short term.
You could add all the information to disambiguate types into a meta object at the top level. This way the clean access is still given, but allows you to parse that or define it if needed.
Right it should be possible to hide the disambiguation away from the user (but still keep it available for serialization) 🤔
Saying that, I don't have any plans for continued development as the current method of parsing (using jison) exhausts heap space, so a rewrite would be necessary to make viable to parse large files.
Do you have any current thoughts for what you want to rewrite it in/to use?
So this can still be written in js -- it'd just need to be some sort of hand written recursive descent parser (basically any JSON parser can be used for inspiration). I've written paradox parsers in C#, js, F#, and most recently (but not open sourced) rust. Each language has it's own tradeoffs, so I don't think there'd be one solution that could rule all.
Why a recursive descent parser? I've written a parser in C++ that achieves this with regular expressions, which proves that the language is regular. What files did you look at when deciding it's a context free grammar?
I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization. As a start, I've ported my C++ parser to JS. https://github.com/soryy708/jomini/tree/parser There's still some work to be done on the parser and tokenizer, so that the tests will pass.
What files did you look at when deciding it's a context free grammar?
I'm not too privy to computer science terminology, but I believe it is not a regular language as the format allows arbitrary embedding of delimiters (objects can contain array of objects repeatedly). It's the same reason why JSON is not regular.
I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization. As a start, I've ported my C++ parser to JS.
Excellent. I'm more than happy to see what you're thinking.
Apparently you've also made some hand-rolling progress a while ago: https://github.com/nickbabcock/jomini/tree/handroll
Apparently this used to be powered by a hand-rolled parser before. parser.js (https://github.com/nickbabcock/jomini/commit/4c7ece211793d77ed686f97e98a23b63095ae65b) Why was it migrated to Jison?
Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests
Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/
Apparently this used to be powered by a hand-rolled parser before. parser.js (4c7ece2) Why was it migrated to Jison?
Haha, who knew!? Forgot that the commit is from 5 years ago. Looks like I may need to write more descriptive commit messages 😆
My assumption looking at those commits is that jison provided an easier API for development and users at that time. In hindsight, I wished I iterated on the handrolled version, as jison seems unmaintained and a bit baroque, but oh well 🤷♂
Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests
Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/
Yeah there are a lot of parsers out there. I've written my own fair share (C#, C# (2), F#, this one, and other closed sourced implementations). Writing parsers for games you love is a great excuse to program 😄
The latest release uses a parser that is functionally lossless so it would be possible to write out a structure (but not from a JS object).
It would be something along the lines of:
const out = parser.parseText(data, {}, (q) => {
// update an EU4 save so that the player is england
q.at("/player", "ENG");
return q.writeTo(/* a writable stream? */);
});
While this feature is now possible to be implemented in the latest release, I don't have a personal drive for implementing this feature, so as of now if this feature needs to be implemented, it should be done by the community. I'm happy to guide one through the process if they decide to take up this mantle, but until there is a volunteer, I'm going to close this issue.
@nickbabcock sounds good, and I have some interest in implementing that. I don't know how to interface with your webasm implementation though. Does it have documentation?
Excellent, I'll reopen the issue for further discussion.
The underlying parser has documentation.
One can derive inspiration from the code bases that convert binary data to plain text:
The binary data has a slightly different format so it won't be one to one but both text and binary formats functionally behave the same.
I too started to take a peek into this. I unfortunately don't have a ton of experience, especially with web assembly, and have been pretty lost in trying to make this change.
I noticed @nickbabcock that another library of yours implements this feature: https://github.com/nickbabcock/Pdoxcl2Sharp
I'm considering using C# just for this feature in a tool I'm creating, but figured I'd ask if there's any kind of update coming on this soon or if there's a way I can help.
The issue with converting js objects is that some fields will need to be enriched so that they can be written out properly: For instance, we'd want an object like
{
army: Inflate([{ name: Quoted("army1") }, { name: Quoted("army2") }]),
type: "western",
cores: [Quoted("ENG"), Quoted("FRA")]
}
in order to write out:
army={ name="army1" }
army={ name="army1" }
type=western
cores={ "ENG" "FRA" }
In order to facilitate ergonomics, currently the object returned from parsing is not enriched. I would need to see / investigate how one could provide these enriched types without sacrificing ergonomics or performance. Feel free to share ideas or suggestions.
I created a PR to allow one to create PDS text documents: https://github.com/nickbabcock/jomini/pull/59
Please let me know your feedback and if that PR would close this issue.
I have some basic code for writing arbitrary objects, in case anyone finds it useful. The constants at the start are somewhat game-specific, but can be adapted. Here I have what works for Stellaris custom empire designs.
const FLAT_ARRAY_KEYS = [
'ethic',
'trait',
];
const UNQUOTED_KEYS = [
'gender',
];
/**
* @param writer {Writer}
* @param key {string}
* @param value {any}
*/
function writeKeyValue(writer, key, value) {
if (/^[a-zA-Z_]+$/.test(key)) {
writer.write_unquoted(key);
} else {
writer.write_quoted(key);
}
writer.write_operator('=');
writeAny(writer, value, key);
}
/**
* @param writer {Writer}
* @param obj {object}
*/
function writeObject(writer, obj) {
writer.write_object_start();
writeEntries(writer, obj);
writer.write_end();
}
/**
* @param writer {Writer}
* @param obj {object}
*/
function writeEntries(writer, obj) {
for (const [key, value] of Object.entries(obj)) {
if (FLAT_ARRAY_KEYS.includes(key) && Array.isArray(value)) {
for (const item of value) {
writeKeyValue(writer, key, item);
}
} else {
writeKeyValue(writer, key, value);
}
}
}
/**
* @param writer {Writer}
* @param obj {Array}
*/
function writeArray(writer, obj) {
writer.write_array_start();
for (const item of obj) {
writeAny(writer, item);
}
writer.write_end();
}
/**
* @param writer {Writer}
* @param obj {any}
* @param key {string}
*/
function writeAny(writer, obj, key = undefined) {
if (Array.isArray(obj)) {
writeArray(writer, obj);
} else switch (typeof obj) {
case 'string':
if (UNQUOTED_KEYS.includes(key)) {
writer.write_unquoted(obj);
} else {
writer.write_quoted(obj);
}
break;
case 'number':
if (Number.isInteger(obj)) {
writer.write_integer(obj);
} else {
writer.write_f64(obj);
}
break;
case 'boolean':
writer.write_bool(obj);
break;
case 'object':
if (obj instanceof Date) {
writer.write_date(obj);
} else if (obj) {
writeObject(writer, obj);
}
break;
}
}
Jomini converts data files generated by the Clausewitz engine into an object, but can not converts JS-object into Clausewitz engine data file. Why not do the method for reverse conversion? This would facilitate the creation of various editors.