nickbabcock / jomini

Parses Paradox files into javascript objects
MIT License
77 stars 9 forks source link

Convert object to data file #5

Closed Vladimir37 closed 1 year ago

Vladimir37 commented 7 years ago

Jomini converts data files generated by the Clausewitz engine into an object, but can not converts JS-object into Clausewitz engine data file. Why not do the method for reverse conversion? This would facilitate the creation of various editors.

nickbabcock commented 7 years ago

Yes, I absolutely agree that reverse conversion (often called serialization) would be a huge boon. Unfortunately the conversion from the data file to an object is lossy, meaning that given certain objects, it's ambiguous what the correct serialization should be.

Given:

{
  "foo": [1, 2]
}

is the correct serialization:

foo=1
foo=2

or

foo = { 1 2 }

or even

foo = { 1.000 2.0 }

jomini currently doesn't have a strong enough vocabulary to roundtrip deserialize and then serialize without ambiguity.

Vladimir37 commented 7 years ago

It seems to me that this problem can be solved if Jomini will display the serialized data in a format like this:

{
  "type": <type>,
  "body": <body>
}

In this way,

  {
    "foo": {
      "type": "int_array",
      "body": [1, 2]
    }
  }

will be deserialized to

foo = { 1 2 }

If type field is float_array:

foo = { 1.0 2.0 }

If type field is chain:

foo=1
foo=2

Using the "type" field would help eliminate ambiguity, as it seems to me.

nickbabcock commented 7 years ago

You're absolutely right that there are ways to disambiguate the types (and your idea is a good one). The one downside is that instead of accessing like foo.bar, one would have to do foo.body.bar.body

Vladimir37 commented 7 years ago

This problem can be solved if Jomini will have two methods. For example:

Thus, the data for easy viewing and data for use with subsequent serialization will be separated.

nickbabcock commented 7 years ago

Ideally there'd only be one method to ease differences in parsing. There may be a way to create a class where we keep all properties on the class + a jomini_type() method that is used in a save() method to disambiguate.

But this may be wishful thinking and creating two methods may be more practical in the short term.

C45tr0 commented 5 years ago

You could add all the information to disambiguate types into a meta object at the top level. This way the clean access is still given, but allows you to parse that or define it if needed.

nickbabcock commented 5 years ago

Right it should be possible to hide the disambiguation away from the user (but still keep it available for serialization) 🤔

Saying that, I don't have any plans for continued development as the current method of parsing (using jison) exhausts heap space, so a rewrite would be necessary to make viable to parse large files.

C45tr0 commented 5 years ago

Do you have any current thoughts for what you want to rewrite it in/to use?

nickbabcock commented 5 years ago

So this can still be written in js -- it'd just need to be some sort of hand written recursive descent parser (basically any JSON parser can be used for inspiration). I've written paradox parsers in C#, js, F#, and most recently (but not open sourced) rust. Each language has it's own tradeoffs, so I don't think there'd be one solution that could rule all.

soryy708 commented 4 years ago

Why a recursive descent parser? I've written a parser in C++ that achieves this with regular expressions, which proves that the language is regular. What files did you look at when deciding it's a context free grammar?

soryy708 commented 4 years ago

I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization. As a start, I've ported my C++ parser to JS. https://github.com/soryy708/jomini/tree/parser There's still some work to be done on the parser and tokenizer, so that the tests will pass.

nickbabcock commented 4 years ago

What files did you look at when deciding it's a context free grammar?

I'm not too privy to computer science terminology, but I believe it is not a regular language as the format allows arbitrary embedding of delimiters (objects can contain array of objects repeatedly). It's the same reason why JSON is not regular.

I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization. As a start, I've ported my C++ parser to JS.

Excellent. I'm more than happy to see what you're thinking.

soryy708 commented 4 years ago

Apparently you've also made some hand-rolling progress a while ago: https://github.com/nickbabcock/jomini/tree/handroll

soryy708 commented 4 years ago

Apparently this used to be powered by a hand-rolled parser before. parser.js (https://github.com/nickbabcock/jomini/commit/4c7ece211793d77ed686f97e98a23b63095ae65b) Why was it migrated to Jison?

soryy708 commented 4 years ago

Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests

soryy708 commented 4 years ago

Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/

nickbabcock commented 4 years ago

Apparently this used to be powered by a hand-rolled parser before. parser.js (4c7ece2) Why was it migrated to Jison?

Haha, who knew!? Forgot that the commit is from 5 years ago. Looks like I may need to write more descriptive commit messages 😆

My assumption looking at those commits is that jison provided an easier API for development and users at that time. In hindsight, I wished I iterated on the handrolled version, as jison seems unmaintained and a bit baroque, but oh well 🤷‍♂

Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests

Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/

Yeah there are a lot of parsers out there. I've written my own fair share (C#, C# (2), F#, this one, and other closed sourced implementations). Writing parsers for games you love is a great excuse to program 😄

soryy708 commented 4 years ago

Are any of these parsers fit for the purpose of unambiguous conversion from JSON back to Clausewitz format? Maybe the cheapest solution is to make a binding between C# and JavaScript (with edge and/or node-gyp)

nickbabcock commented 4 years ago

The latest release uses a parser that is functionally lossless so it would be possible to write out a structure (but not from a JS object).

It would be something along the lines of:

const out = parser.parseText(data, {}, (q) => {
  // update an EU4 save so that the player is england
  q.at("/player", "ENG");
  return q.writeTo(/* a writable stream? */);
});

While this feature is now possible to be implemented in the latest release, I don't have a personal drive for implementing this feature, so as of now if this feature needs to be implemented, it should be done by the community. I'm happy to guide one through the process if they decide to take up this mantle, but until there is a volunteer, I'm going to close this issue.

soryy708 commented 4 years ago

@nickbabcock sounds good, and I have some interest in implementing that. I don't know how to interface with your webasm implementation though. Does it have documentation?

nickbabcock commented 4 years ago

Excellent, I'll reopen the issue for further discussion.

The underlying parser has documentation.

One can derive inspiration from the code bases that convert binary data to plain text:

The binary data has a slightly different format so it won't be one to one but both text and binary formats functionally behave the same.

CharacterOverflow commented 3 years ago

I too started to take a peek into this. I unfortunately don't have a ton of experience, especially with web assembly, and have been pretty lost in trying to make this change.

I noticed @nickbabcock that another library of yours implements this feature: https://github.com/nickbabcock/Pdoxcl2Sharp

I'm considering using C# just for this feature in a tool I'm creating, but figured I'd ask if there's any kind of update coming on this soon or if there's a way I can help.

nickbabcock commented 3 years ago

The issue with converting js objects is that some fields will need to be enriched so that they can be written out properly: For instance, we'd want an object like

{
  army: Inflate([{ name: Quoted("army1") }, { name: Quoted("army2") }]),
  type: "western",
  cores: [Quoted("ENG"), Quoted("FRA")]
}

in order to write out:

army={ name="army1" }
army={ name="army1" }
type=western
cores={ "ENG" "FRA" }

In order to facilitate ergonomics, currently the object returned from parsing is not enriched. I would need to see / investigate how one could provide these enriched types without sacrificing ergonomics or performance. Feel free to share ideas or suggestions.

nickbabcock commented 3 years ago

I created a PR to allow one to create PDS text documents: https://github.com/nickbabcock/jomini/pull/59

Please let me know your feedback and if that PR would close this issue.

Clashsoft commented 1 year ago

I have some basic code for writing arbitrary objects, in case anyone finds it useful. The constants at the start are somewhat game-specific, but can be adapted. Here I have what works for Stellaris custom empire designs.

const FLAT_ARRAY_KEYS = [
  'ethic',
  'trait',
];
const UNQUOTED_KEYS = [
  'gender',
];

/**
 * @param writer {Writer}
 * @param key {string}
 * @param value {any}
 */
function writeKeyValue(writer, key, value) {
  if (/^[a-zA-Z_]+$/.test(key)) {
    writer.write_unquoted(key);
  } else {
    writer.write_quoted(key);
  }
  writer.write_operator('=');
  writeAny(writer, value, key);
}

/**
 * @param writer {Writer}
 * @param obj {object}
 */
function writeObject(writer, obj) {
  writer.write_object_start();
  writeEntries(writer, obj);
  writer.write_end();
}

/**
 * @param writer {Writer}
 * @param obj {object}
 */
function writeEntries(writer, obj) {
  for (const [key, value] of Object.entries(obj)) {
    if (FLAT_ARRAY_KEYS.includes(key) && Array.isArray(value)) {
      for (const item of value) {
        writeKeyValue(writer, key, item);
      }
    } else {
      writeKeyValue(writer, key, value);
    }
  }
}

/**
 * @param writer {Writer}
 * @param obj {Array}
 */
function writeArray(writer, obj) {
  writer.write_array_start();
  for (const item of obj) {
    writeAny(writer, item);
  }
  writer.write_end();
}

/**
 * @param writer {Writer}
 * @param obj {any}
 * @param key {string}
 */
function writeAny(writer, obj, key = undefined) {
  if (Array.isArray(obj)) {
    writeArray(writer, obj);
  } else switch (typeof obj) {
    case 'string':
      if (UNQUOTED_KEYS.includes(key)) {
        writer.write_unquoted(obj);
      } else {
        writer.write_quoted(obj);
      }
      break;
    case 'number':
      if (Number.isInteger(obj)) {
        writer.write_integer(obj);
      } else {
        writer.write_f64(obj);
      }
      break;
    case 'boolean':
      writer.write_bool(obj);
      break;
    case 'object':
      if (obj instanceof Date) {
        writer.write_date(obj);
      } else if (obj) {
        writeObject(writer, obj);
      }
      break;
  }
}