Report file stability - Githubissues

Alberth289346 commented 1 year ago

The report file is the file produced by the JPlag analysis program, and the report viewer. It is used by all versions of JPlag and all versions of the viewer. That makes the report file a very much shared resource.

In addition, the report file changes in time. newer versions of both the viewer and the JPlag program get new abilities or existing abilities are modified. Such changes need to be transported through the report file which causes semantic and non-semantic changes in the file.

So far, the idea is/was to be that a particular version of the JPlag program decides the content of the report file, and both the report file and the report viewer follow those changes. This works well until you get multiple JPlag versions in use "out there" (some more old than others) and people need or may want to use a report viewer from a different point in time to inspect their results.

The problems that arise when you try to combine a JPag program from a different point in time than the report viewer are:

Older JPlag with newer viewer:

The viewer can know what to expect from the program (there is a version number in the report file). Conceptually, the viewer must repetitively "upgrade" the report file to the next JPlag version until it arrives at the version of the viewer. Some data may not be available, and there the viewer must invent that data from the data that is available, or disable part of its functionality. (I am leaving out the option of giving up completely for the purpose of the discussion). Note that "upgrade" effort can be reduced if you avoid or forbid moving data in the report file. (That is, if some data "lives" in the report file under the name foo in JPlag version X and it also exists there in JPlag version X+1, upgrade from X to X+1 is trivial for that data.
Newer Jplag with older viewer:

The viewer has no clue what it gets. For a "best effort", the JPlag program has to be "predictable". Data that gets stored at the place where the viewer expects it can be easily found and accepted, assuming its semantics didn't change!!

The program may also provide data that is alien to the viewer, and it may omit data that the viewer expects. The former can be simply ignored (the user won't get the most advanced report displayed but that is to be expected if you use an old viewer), the latter can be handled like above, disable some of the functionality of the viewer (and again leaving out the option of giving up here).

While much of the above is kicking in open doors, the points that stand out seem to be

Prepare the viewer for getting files from different points in time (both earlier and future versions).
- Some form of report file upgrading in the viewer seems useful.
- Some data parts may get missing. What to do then probably needs to be decided for each case separately (find a replacement for that data, reduce viewer capabilities, or give up).
Don't move or change semantics of existing data.
- Older viewers can then find that data, and newer viewers have reduced effort to upgrade the report file.

Some further points on the data file itself:

While we use JPlag program versions here, nothing here is really about that program. In particular, if a new JPlag version would appear that only improves its analysis performance, should that lead to a new report file version? Nothing in the data has changed, it just takes less CPU cycles to obtain the data. So the above is more about a "report file version" than it is about a "JPlag program version".

To get away from case like this , one solution could be to have a report file version with defined content. Then everything above can be based on report file contents rather than JPlag program version. I's more work (there are more version numbers that change), it will pay off when the report file becomes more stable than the JPlag program.
Since the viewer must do version upgrades of the file, it will always shuffle/grab pieces of data from different points in the file and assemble them into a useful structure for the viewer. (Different JPlag versions may change the order of writing data or a different report viewer may need a different internal structure of the data).

That raises the question how useful it is to have a trees of data in the file containing various different data-parts.

For example in the old version of the example file in #1189, there is metrics at top-level, and inside that, parts I named "avg_metrics" and "max_metrics" it seems. This may bind layout and semantics of layout and semantics of both forms of metrics. If you instead move the latter 2 parts to top-level and let the report viewer construct the top-level metrics list, each metrics can exist on its own independent of the other metrics, and each metric can evolve on its own. (The latter may complicate the report viewer, but that is unavoidable in that case anyway.)

This leads to the idea that a report file version has a set of "data features" (a collection of data parts), that changes in time.

And this is how far I got. I am probably miles ahead, took a wrong turn, and am unclear in my explanation but hopefully it can act as a source of inspiration towards a more stable report file.

sebinside commented 1 year ago

Thank you for the comprehensive issue, I'll try to keep my answer as short as possible!

In the old JPlag days (prior to the modernization efforts that started at the end of 2020), this was not an issue since JPlag only generated static HTML files as reports. However, the HTML generation was scattered in different places and so out of date and hard to maintain, we had to replace it. To maximize the independence of the viewing logic and the generation logic, we introduced the Report Viewer with a JSON file to hold the data. While this is a viable way to go, our fault was it to deploy the viewer online. Fun fact, this was not originally planned but happened as a test pilot and then lasted, unfortunately.

TL;DR: To fix this issue once and for all we currently work towards integrating the report viewer with the main JPlag deliverable, starting with #1145 and #1176. Self-hosting a (customized) report viewer will of course still be possible, but not the intended way of using JPlag. This is closer to the original and should end those nasty side effects of data representation in the JSON file which had no benefit at all.

sebinside commented 1 year ago

Side note: More information can always be found in #1000, that will be open until we are done with the major report viewer overhaul, ETA mid 2024.

Alberth289346 commented 1 year ago

Thanks for the update in explaining the future path.

tsaglam commented 10 months ago

As an update, with major release 5.0.0 (hopefully Q1 2024), there will still be a report viewer; however, it should be backward compatible. This release also brings the (for now optional) local mode, which will become the main way to use JPlag in the future.

jplag / JPlag

Report file stability #1190