niftools / nifxml

A repository for the nif.xml file, which contains the nif file format description.
http://www.niftools.org
GNU General Public License v3.0
37 stars 43 forks source link

Organizing the format descriptions, templates and format informations on a NifTools repository. #50

Open Dexesttp opened 8 years ago

Dexesttp commented 8 years ago

Disclaimer

There's no real place to discuss this on GitHub as there's no team repository. The most relevant repositories to discuss this are this one and kfmxml, so it seems like a good place to open this issue.

Scope

The NifTools ecosystem is built around both tools to use and manipulate .nif, .kf and other formats, but also around ways to describe, analyze and manipulate these formats.

The main way to describe file formats have been .xml files with descriptions of the file's content throughout versions. Therefore, there have been two repositories made specifically for hosting these files, namely nifxml and kfmxml.

However, there hasn't been any way to store other files used in describing the contents of nif/kf files. I'm thinking about .bt templates used by 010, but not only.

Issue overview

There's no definitive and easy place to store and share templates and data used while studying the different formats handled by the NifTools tools.

The easiest way would be to create a new repository to put the templates and data on.

However, creating a new repository for this raises several organization questions. What are the limits of the file we would put in this repository ? Do we limit it to specific files ? Shouldn't already existing .xml files be in this repository too, as they are also in their own way "analysis tools" ?

Problems to solve.

Relevancy of NifXML/KfmXML specific repositories.

The first problem to solve is to determine whether or not the nifxml and kfmxml are still relevant as is. They are widely used as-is, but this shouldn't be the only reason to keep them as such.

Limits of a new repository

There's a need to determine what is the intended purpose of a new repository for analysis tools, if there's a will to create it.

The main point to consider is whether to limlit the file type in this repository to a single format, therefore duplicating repositories for each format. An example of an end result would be :

Feel free to give your opinion about these problems below.

@niftools/core @niftools/nifxml-reviewer

hexabits commented 8 years ago

010 Approach

Ideally, I think for each supported version listed in the XML we should probably have a .bt file. Everything common that doesn't change between versions could be in a shared .bt include file. Or each .bt file includes the previous version, and anything that doesn't change is found in the version it was first introduced. Except I'm not exactly sure how 010 deals with redefinition/overriding, so maybe the latter is impossible.

Or, we do it piecemeal. Have a folder for each version. When there are attempts to decode certain blocks, work on those in a .bt file and leave the rest of the standard blocks undecoded, only uploading the relevant work to the repo. Similar to how figment and Caliente worked on the FO4 NIFs.

In fact I think following how figment has set up his templates and scripts would be a pretty good foundation.

An alternative would be actually building .bt files from the nif.xml. Basically, niflib's code generation but for C-style .bt files. At least one time to get a starting point for all future work. You give the parser a specific version, and it spits out all the rows/blocks that apply to that version and turns it into a binary template. Considering how convoluted the XML has become, getting to see cleaner C-style templates on a per-version basis would also help spot errors. We could run the templates that the parser generates to confirm that the XML is correct for that version. Though this would be slightly redundant as NifSkope has the XML Checker.

In general I think having an 010 template repository would be a very good thing. The various projects all decode and use the decoded information differently (nif.xml alone, nif.xml + niflib, nif.xml + pyffi), and are in different languages. That means no one is really working exactly on the same thing when updating for new NIF versions, and work could be duplicated across each project instead of centralized. Basically, 010 templates would be a centralized project/language agnostic way of housing decoding work.

Also decoding via nif.xml alone can be kind of cumbersome, and requires NifSkope actually being updated in many cases to accept new basic types. If I or anyone weren't around to update NifSkope the decoding process could be stopped dead in its tracks.


Alternative Approach

Aside from .bt, I have thought for a long time about having a GUI-based program which is meant to parse and edit the XML. It wouldn't just be an XML editor, but a way of visualizing the XML in a cleaner language-agnostic way. You could filter by NIF version to see what the data truly looks like for that version, without having to decipher all the cruft in the raw XML. It could also export the data to 010 templates or other formats. It would basically be a way of editing and decoding the data definitions without having to know how to edit the XML.

However this would probably necessitate a major upheaval to the contents of nif.xml. Massive reorganization of the blocks, the attributes, and new metadata to make the GUI editing of the XML easier like groups/sections.

Or, maybe the program stores the data definitions completely differently and just gives the option of exporting it to a generated nif.xml at the end. Say, maybe a contributor or modder really wants to optimize their nif.xml for just Bethesda games. They open up the program and then select the appropriate versions and export a new nif.xml. I don't want to contrive reasons for doing such a thing, but just off the top of my head less XML = less parsing.

A level of abstraction over XML would have many benefits. First and foremost, it would eliminate user error when editing the XML by hand. Namely, _versioning issues_. The version conditions are so hard to read and understand while looking from line to line that it puts most people off from touching the XML at all. If we instead define in the program what the data should look like for each version and let the program coalesce this data into the nif.xml itself, we remove that issue entirely. Many other issues are also abolished this way, like misspelling or misuse of attributes.

This kind of abstraction is also looking increasingly more attractive after the recent decoding process with FO4. Various differences between projects means we're not all using the same nif.xml anymore. Maybe these differences could be overcome by abstracting the data definitions.

For example, the recent problems NifSkope has faced with the speed of ARG keyword parsing and the way the FO4 vertex format has to make such ubiquitous use of them. Figment created an ARG array syntax that I can't adopt yet, because ARG alone can bring NIF loading to a crawl. Right now, the nif.xml repository cannot accommodate both of our approaches because the nif.xml format cannot define the same block multiple times. If we abstracted the data and made nif.xml something you generate with a program then the data definitions could house both eventualities. Then the project authors choose which version to use. A lone repo with only a raw nif.xml simply can't do this.

Basically, this program could accomplish the same thing as an 010 template repository, providing a central location for file format definition editing and decoding. It would additionally replace the current nifxml/kfmxml repos because the XML would no longer be edited directly.


In my opinion, both approaches aren't exclusive either. While the usefulness of an 010 template repo would be diminished by the existence of a dedicated program for NIF data definition editing, there are still other files and formats that are relevant to NIFs but that aren't defined in nif.xml.

For example:

neomonkeus commented 8 years ago

Thoughts so far based on input top to bottom.

@Dexesttp

However, creating a new repository for this raises several organization questions. What are the limits of the file we would put in this repository ? Do we limit it to specific files ? Shouldn't already existing .xml files be in this repository too, as they are also in their own way "analysis tools" ?

We don't need to create a new repo if we don't want to. This repo can be repurposed and renamed if required. One reason not too is that existing issues would need to be ported to the new repo. There hasn't been enough attention to the kfm repo, so it would be less effort to recreate the issues again here.

There's a need to determine what is the intended purpose of a new repository for analysis tools, if there's a will to create it.

The main point to consider is whether to limit the file type in this repository to a single format, therefore duplicating repositories for each format. An example of an end result would be :

niftools/010-templates : contains 010 specific templates in the .btformat Feel free to give your opinion about these problems below.

You have put in a good deal of consideration into this. From an organisational point of view how best to tackle the problem is indeed an important consideration. I suppose it is a question where we keep the separate repos for each format we wish to support and have a super repo, have separate repos for each and import them into the super repo as a submodule. I suppose the main case for each would be consistency. I think there is pros and cons for each, but I would probably be in favour initially of the first option, but open to opinions on each.

@jonwd7 I suppose I am mainly thinking at the higher level structural point of view at the moment. As a general rule of thumb, I prefer convention over configuration. If we decide on a standardised layout which facilitates the above, I don't see why as you said yourself the ideas mentioned would not be mutually beneficial to all format users.

My own thoughts are that the format descriptions are a bit static, in that they are at any point in time show what the current understanding is, but lacking the context. If we can add additional tools, which enables a collaborative process and enables people to see the context then this is a marked improvement.

neomonkeus commented 8 years ago

One thing I was thinking about, not sure how likely it is to happen.If we go the super repo route, how would we handle per version updates for any specific format?

For example, if we have an update to the nif format commit 1, then an update for the bsa 2. In the super repo scenario, you can't isolate the changes from one or the other. If you only want the bsa format changes, the nif changes come along for the ride.

hexabits commented 8 years ago

@Dexesttp @neomonkeus @ttl269 I have started doing this in my repo here: https://github.com/jonwd7/0x6e6966746f6f6c73

With an emphasis on support for version 20.2.0.7 NIFs.

With the easier to write C-style structs I have had an easy time filling out all of the data definitions from our existing knowledge in nif.xml, and I've already started fixing and adding onto our existing knowledge. I've lost count of the unknown values I've decoded. There is also a lot wrong with the nif.xml definitions of NiParticleSystem and NiParticlesData especially.

In case you're wondering I've gotten all this new information from the DLLs for the FO4 exporter and other tools. Despite being for FO4, a lot of the stuff in there dates back to FO3 or earlier. It seems that 100% of Bethesda's Save/Load code uses solely the stream version (what we call User Version 2). So basically all the conditions in nif.xml where we use User Version 11/12 in expressions is not ideal.

Unfortunately, I don't know if I have the patience to port all of this back to nif.xml. Having to work around stuff that's in there for archaic or obscure NIF versions that I don't even have enough of to test on makes it infeasible for me.