Regression testing for rendering engine

morevnaproject commented 6 years ago

We need a test script which identifies regressions in Synfig's render engine.

The script should render a set of sample files using latest version of Synfig and then compare the resulting images against the reference data.

In perspective it would be nice to measure performance and provide statistics, so we can identify performance drops or boosts from version to version.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/64151139-regression-testing-for-rendering-engine?utm_campaign=plugin&utm_content=tracker%2F321991&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F321991&utm_medium=issues&utm_source=github).

morevnaproject commented 6 years ago

@reSHARMA I haven't perfectly understood your last comment at https://github.com/synfig/synfig/issues/499#issuecomment-426063492 - is your proposal to use https://ci.debian.net/ as a platform for that?

reSHARMA commented 6 years ago

@morevnaproject We can use this test script after every build (our travis builds) but since I saw https://packages.debian.org/unstable/main/synfig the package at Debian, which can also use the same script and https://ci.debian.net/ would automatically install the latest debs and run this script.

Just another utilization of the test script.

morevnaproject commented 6 years ago

Ah, got it now! ^__^

Okay, lets elaborate about the task a bit.

1. We need a set of sample files for verification.

I suggest to start with simple tests: single layer tests. We have 50+ different layers in Synfig. Some of them are stand-alone layers - they draw something on the screen (Circle Layer, Outline Layer, Region Layer, etc). There are second type of layers - filters. Those are used in combination with other layers (Blur Layer, Shadow Layer, etc).

For first type of layers (stand-alone): let's create a simple test file with just one layer inside. Thus, we will have one test file for each corresponding stand-alone layer type.

For second type of layers (filters) we have to come up with something more complex. We need a test file with at least 2 layers: base layer + filter layer. The question is what is the best candidate for base layer? Or should we have several candidates?

2. Reference images

When we will have a set of sample files, we will need to create a rendered versions for each of them. This will be a reference images to compare with. The question: which Synfig version we will use to generate reference renderings?

My proposal is to use Synfig 1.0.2, since it is the last version that uses old rendering engine. Later versions use new Cobra rendering engine, which is still in development now (and it is a subject of our tests).

So, the purpose of the script is to build the latest commit from master branch of Synfig and compare with reference images (which represent renderings made by version 1.0.2).

3. Where to keep all that stuff?

I propose to create a separate GitHub repository for that. Since it is going to contain a lot of images (reference renderings), and it is possible the set of test images will eventually grow, so I suggest to use it in combination with GitLFS.

4. Which platform to use for running our tests?

There are plenty of options here: Travis, Azure Pipelines, etc. We need to choose something. We can start with something simple, and if we will need more power, then we can migrate to something else or own hosted solution.

I think it is enough to run our tests on Linux hosts only, since I do not remember the situations when our rendering engine had OS-dependent behavior.

Looking forward to hear your thoughts. ^__^

reSHARMA commented 6 years ago

Many thanks for documenting the whole process :)

Instead of incrementally following the steps, shall we first prepare some tests (~10). Set up the build infrastructure and do all the analysis from the results, and then we can fill our testsuite.

My main motivations for this is:

Individual test can act a great first contribution and may attract some college students as well.
We can get good idea about the load using the present build infrastructure (as the number of test will increase at a fixed rate)

We need a set of sample files for verification. The question is what is the best candidate for base layer? Or should we have several candidates?

I don't have any idea about this right now, but as we proceed with the single layer test. I'll gain some familiarity and then can say something.

Reference images My proposal is to use Synfig 1.0.2

I agree, and later we may add a script to override the reference using another version.

Where to keep all that stuff? I propose to create a separate GitHub repository for that. Since it is going to contain a lot of images (reference renderings), and it is possible the set of test images will eventually grow, so I suggest to use it in combination with GitLFS.

This sounds great :)

Which platform to use for running our tests?

I propose to use the current linux travis job for this and as the load would increase (if it does) we can shift.

What do you think? :)

kaylangan commented 6 years ago

I'm a Program Manager on Azure Pipelines. Looks like you're leaning Travis, but let me know if you have any questions about Azure Pipelines!

reSHARMA commented 6 years ago

@kaylangan Thanks for showing interest :) I propose to start with Travis, as we already have Linux job on it but when these tests would take significant percentage of build time (someday they will) we need to shift and then Azure Pipelines is always an option.

Though this is just a proposal, let's see what @morevnaproject think about it.

morevnaproject commented 6 years ago

@reSHARMA

Instead of incrementally following the steps, shall we first prepare some tests (~10). Set up the build infrastructure and do all the analysis from the results, and then we can fill our test suite.

Absolutely agree. ^__^

I don't have any idea about this right now, but as we proceed with the single layer test. I'll gain some familiarity and then can say something.

Yes, let's start with simple tests for the first types of layers.

I have created a repository here - https://github.com/synfig/synfig-tests-regressions

Added a few sample test files for Circle Layer. Notice, that I have used ".sif" format instead of ".sifz", since it is contains plain XML data and allows to track changes (SIFZ format is a gz-compressed SIF).

Next step is to produce reference renderings. I suggest to render them at resolution of 1920x1080 - that should be enough. Since we will have a lot of sample files, it might be good idea to add some automatization script for rendering reference images. ^__^

My proposal is to use Synfig 1.0.2

I agree, and later we may add a script to override the reference using another version.

Great. Here is a link to get prebuilt binaries of version 1.0.2 - https://sourceforge.net/projects/synfig/files/releases/1.0.2/

Yes, I expect that for some reference images we eventually might use other versions of Synfig. There are at least two reasons for that. First, is that some layer types might not exist in 1.0.2. Second reason is that in newer versions of Synfig some layers might render better (at the end 1.0.2 is not bug-free as well). But that will be decided individually for each case. For now let's concentrate on trivial layers. I just want to mention that for every reference image it would be nice to attach information which version was used for its rendering.

I propose to use the current linux travis job for this and as the load would increase (if it does) we can shift.

Great, let's go for that.

morevnaproject commented 6 years ago

Hello @kaylangan! Thank you for your kind proposal! For now with this task I think it's good idea to go with something we already have. Though, I have a special interest for Azure Pipelines for another task - testing our code for build errors and deploying binary packages. Considering that Azure Pipelines provides possibility to build on all 3 platforms (Linux, Windows and OSX) that might be a great improvement for our project!

reSHARMA commented 6 years ago

@morevnaproject

Yes, let's start with simple tests for the first types of layers.

I've added some tests and will move ahead. (Will share about this in my college community) Can I raise a issue for it with some tags like "help wanted" "good first issue" etc?

I expect that for some reference images we eventually might use other versions of Synfig.

Shall we create a config file which would have the mapping of version <--> layer or something similar and the script could read this and act accordingly?

morevnaproject commented 6 years ago

I've added some tests and will move ahead. (Will share about this in my college community) Can I raise a issue for it with some tags like "help wanted" "good first issue" etc?

Sure! ^__^

Shall we create a config file which would have the mapping of version <--> layer or something similar and the script could read this and act accordingly?

How about this? - https://github.com/synfig/synfig-tests-regressions/tree/master/references/layers/circle (a text file for each reference image - with a version number inside)

reSHARMA commented 6 years ago

How about this? - https://github.com/synfig/synfig-tests-regressions/tree/master/references/layers/circle (a text file for each reference image - with a version number inside)

Sure, I'll come up with a script with this pattern.

morevnaproject commented 6 years ago

After thinking a bit more on that, I come to conclusion that it is not very good to keep .txt file in "references" directory. Might be better to have it in "sources", together with .sif file. So, you can add a new test (sif file) and place .txt file into same location (no need to jump into another dir). I have made the change for this - https://github.com/synfig/synfig-tests-regressions/commit/d6676662af6e3247a429eee134a62457846382c1

Second thought: it might be too boring (and cluttering) to add .txt file for each test. We can have some configuration file in the root dir, which specifies default version. If no .txt file exist, then default version is used. If it exists - then default version gets overridden.

reSHARMA commented 6 years ago

We can have some configuration file in the root dir, which specifies default version. If no .txt file exist, then default version is used. If it exists - then default version gets overridden.

Sure, this is better :) I'll come up with a script soon.

reSHARMA commented 6 years ago

@morevnaproject I think we should use appimages for generating references or render output from sammples as different versions of synfig are not co-installable. I could not find appimage for version 1.0.2 or for anything other than latest. Do you ship and archive appimage for every version on sourceforge?

morevnaproject commented 6 years ago

@reSHARMA Oh, sorry, I forgot to mention that we do not have AppImage for Synfig version 1.0.2. But we have a binary in tar.bz2 archive, which you can unpack to any path and run from there - https://sourceforge.net/projects/synfig/files/releases/1.0.2/linux/synfigstudio-1.0.2.x86_64.tar.bz2/download For all higher versions of Synfig there is an AppImage package. ^__^

reSHARMA commented 6 years ago

@morevnaproject no worries, I think tar.bz2 archive would be good for other versions also.

morevnaproject commented 6 years ago

So, it looks like this task is done? Now all we have to do is to fill the testing engine with appropriate tests.

BTW, the test engine already identified one difference in behavior of rendering engines for CheckerBoard Layer - https://travis-ci.com/synfig/synfig/jobs/154369637 I do not call this "regression", since the behavior of current rendering engine is more correct (it gives correct image of checkerboard when size->0).

So, that raises a question: I would like to identify the test file - layers/checkerboard/checkerboard-origin-0-size-0.sif - as correctly rendered with CURRENT code. I.e., I need a new reference file, generated by current Synfig code (which is not released as any version yet). How can we achieve this? @reSHARMA any thoughts?

morevnaproject commented 6 years ago

My first thought: We should be able to specify which version to use on per-file basis (if I understood correctly, currently it is specified on per-directory basis). I.e. have layers/checkerboard/checkerboard-origin-0-size-0.txt for layers/checkerboard/checkerboard-origin-0-size-0.sif file.

Second: We should be able to specify a version, which does not exists yet. I.e. "1.3.11". In this case we have to render reference image manually. And the rendering script should NOT re-render reference image if non-existing version specified (this way we can be sure that reference image is not overwritten). -- see my next comment.

@reSHARMA what do you think about such solution?

morevnaproject commented 6 years ago

Ah, wait. You can scratch my last paragraph (about a version, which does not exists yet). We can add a "temporary" version at https://github.com/synfig/synfig-tests-regressions/blob/master/sources/force-render-png.sh#L52 - something like "1.3.10.2018.10.26". And use a temporary build file provided by @blackwarthog - https://dev.icystar.com/downloads/SynfigStudio-1.3.10-testing-18.10.18-linux64-defe1.appimage

reSHARMA commented 6 years ago

@morevnaproject I think out of 42 test 1 had this issue, isn't keeping version per file a comparatively larger overhead to deal with (we would manually need to set that for every test sample).

For now, can we just create a folder exception inside source, we will not create reference for files in exception but would generate result for them.

File could look like checkboard-origin-0-size-0-1.3.10.2018.10.26.sif And then we can watch for changes in the exception folder and that would generate the reference for these when required.

This would easily incorporate your idea for keeping version per file without affecting others.

What do you think? :)

morevnaproject commented 6 years ago

we would manually need to set that for every test sample

Um, possibly I am incorrectly expressed my idea. I have no intention to set a version for each file manually. My idea is to introduce possibility to overrid version for some specific files, in the same way as we have that for dirs (here - https://github.com/synfig/synfig-tests-regressions/blob/master/sources/force-render-png.sh#L76).

So, before rendering specific file, the script can check if a .txt file with the same name exists. If it does, then it reads version from it. If no file exists, then default version is used (defined at directory level or at global level).

I think it is easy to add a check for .txt file here - https://github.com/synfig/synfig-tests-regressions/blob/master/sources/force-render-png.sh#L94 ^__^

reSHARMA commented 6 years ago

@morevnaproject sorry my bad :) I misinterpreted that, this sounds awesome. I'll be coming up with a pr soon.

morevnaproject commented 6 years ago

Great! I think we can consider this task done as soon as https://github.com/synfig/synfig-tests-regressions/pull/9 get merged. ^__^

morevnaproject commented 6 years ago

Okay, now we have everything working as expected. There are still two issues remaining (https://github.com/synfig/synfig-tests-regressions/pull/10#issuecomment-434571523 and https://github.com/synfig/synfig-tests-regressions/issues/11), but in general the system is ready for use.

@reSHARMA would you be able to take care about those remaining two tasks? ^__^

reSHARMA commented 6 years ago

@morevnaproject That seems to be an awesome news. I'm making some changes that would probably solve those two issues (synfig/synfig-tests-regressions#10 (comment) and synfig/synfig-tests-regressions#11) at https://github.com/synfig/synfig-tests-regressions/pull/13

morevnaproject commented 6 years ago

@reSHARMA The changes are merged and I think we can mark this task as done. Thank you very much for your efforts and congratulations! ^__^

reSHARMA commented 6 years ago

Thanks! It's my pleasure to be the part of it. If only filling up the test suite is left then I've some crowd sourcing plans for that. If something else is left, please let me know :)

reSHARMA commented 6 years ago

@morevnaproject Many thanks for the opportunity, I enjoyed being the part :)

morevnaproject commented 6 years ago

@reSHARMA Yes, it's the filling of the suite left so far. Will appreciate any help here! ^__^

reSHARMA commented 6 years ago

@morevnaproject sure it can act as a great first contribution for many, so I'm trying to go in that direction but anyway I'll try to get this done before mid December.

I'm not sure if it's doable but if travis raise issue after the build if any test fail?

morevnaproject commented 6 years ago

About raising an issue on script fail: this sounds cool, but I am afraid that we will end up with a bunch of issues (if new issue will be raised after each commit). I am also not sure if that is possible. ^__^

morevnaproject commented 5 years ago

Hey, @reSHARMA! I have noticed the Travis job fails to render sources from the latest commits. I can see following errors in Travis log:

...
Target name = png
Outfilename = /home/travis/build/synfig/synfig-tests-regressions/sources/../references/layers/gradient-curve/gradient-curve-shape2-width-0.png
boost::filesystem::canonical: No such file or directory: "/home/travis/build/synfig/synfig-tests-regressions/sources/../references/layers/gradient-curve"
...

Can you please take a look? ^__^

morevnaproject commented 5 years ago

Wait, I have a better idea - I have added this issue for our NJACK participants - https://github.com/NJACKWinterOfCode/synfig-tests-regressions/issues/18 ^__^

synfig / synfig

Regression testing for rendering engine #634