Allow automated testing

GoogleCodeExporter commented 9 years ago

What game elements does the new feature/behavior concern?

Level script, mostly.
The idea is to create a campaign which executes from start to end automatically.
Results of the tests can then be read from keeperfx.log.

Where exactly inside the game should it be accessible?

New script commands should be added to allow execution of automated tests.
- A command which allows passing/failing the test, it is already there 
(WIN/LOSE_GAME)
- A command which forces the level to end automatically when win/lose state is 
achieved
- A command which allows determining performance of the test (informs about 
variables used as performance factors and their wages)
New command line parameters should be added:
- A parameter which disables any need of user input (--nointeractive)
- A parameter which starts given campaign without menu screen
- A parameter which sets frameskip mode from the very beginning

Under which conditions should it be triggered?

When a test campaign is started.

How exactly should the new feature work?
1. Start KeeperFX with command line to load test campaigns, nointro, 
nointeractive, frameskip
2. Allow the tests to be executed without human interaction
3. Summarize data gathered in log files (status(pass/fail) + 
performance(integer number))

Please provide any additional information below.

Implementing this would be a huge help for:
- avoiding regressions
- improving performance of computer player and creature AI

Original issue reported on code.google.com by mefistotelis on 9 Aug 2015 at 11:04

GoogleCodeExporter commented 9 years ago

Please don't make it a single campaign, but allow for separate test maps and a 
method for running several maps(test cases) in a batch. For example by having 
an ini file with map-numbers or entering the range of numbers in the command 
line. Something that might also work really well with little effort is to have 
KeeperFX look for a subfolder ‘Testlevels’ when a specific parameter is 
used and run all the maps there.

Also, you need the option to either abort the whole test suite on the first 
fail but more importantly also the option to continue with all the remaining 
test cases on failure.  Compare these alternatives:

Abort at first failure:
TC1: Pass
TC2: Fail
TC3: -
TC4: -

Continue at failure:
TC1: Pass
TC2: Fail
TC3: Fail
TC4: Pass

The second option provides additional information at no extra work, perhaps 
only additional run time. The first option could lead to a situation where TC3 
failure is only found after fixing TC2 and where you don’t know if this 
problem is further regression by fixing TC2 or was there before.

What would also help is to introduce a script command that allows the test 
engineer to write data to the log in a way that can easily be parsed.

Original comment by Loobinex on 9 Aug 2015 at 11:58

GoogleCodeExporter commented 9 years ago

By just having a a parameter '-runtest abort/continue' to start running all 
test maps, and a '-frameskip #' parameter you can much more easily intergrate 
it in your automated build process(run regression test on checkin!). Possibly 
update the '-level' parameter to only run the testmap with that number to 
accomodate manual testruns by having batch files (You'd have to copy the log 
file between each map to store the test results.)
This way the tester doesn't have to open the game manually and select a 
campaign. And you would not need to implement the '-nointeractive' parameter, 
no setting to indicate a campaign is a test-campaign and no functionality to 
skip the landview.

Original comment by Loobinex on 10 Aug 2015 at 12:30

GoogleCodeExporter commented 9 years ago

I would prefer to use the "normal" game path whenever possible.

The "runtest" parameter of your description would be a completely separate code 
path to normal game menus; it would have to be implemented from scratch (though 
I admit it's not a huge code).

Levels in a campaign file can easily be changed to select the tests you want to 
run - you only have to remove a level from SINGLE_LEVELS to disable it.

Also, you will be able to use multiple test campaigns referring to the same 
levels.

Original comment by mefistotelis on 10 Aug 2015 at 7:07

GoogleCodeExporter commented 9 years ago

Regression testing KeeperFX with only the level scripts is limited, but to get 
somewhat decent coverage you will still have to create a very large number of 
maps. Campaigns are currently limited to 25 maps, so you'll have the overhead 
of having to make a lot of campaign.cfg files.
And if you want to run the regression test you'd still have to sit there and 
every 25 maps click another campaign for testing.

The key to automated regression tests is - not surprisingly - automation. The 
design you have right now has far to many manual steps which don't scale well. 
Secondly writing the tests is very labour intensive and automated testing only 
becomes valuable if you have some semblance of coverage. If you don't like my 
suggestions make something else, but please allow (at least in the design) for 
full(!) automation and quick and easy TC creation above all else.

Original comment by Loobinex on 10 Aug 2015 at 7:52

GoogleCodeExporter commented 9 years ago

Actually, the campaign levels limit is now 50 (see 
https://code.google.com/p/keeperfx/source/browse/trunk/keeperfx/src/config_campa
igns.h). And this can be easily increased.
But we would do fine even with executing the levels separately (that actually 
has its advantages too, ie. clean environment for every test and separate log 
file for each test case).

It is easy to create a batch scripts which lists all levels in a folder and 
executes KeeperFX separately for each level.

I agree testing with level script is limited, but this is the only thing that 
looks achievable for me. In case of games, the whole-system tests are easier to 
maintain than unit tests, and not many people are interested in writing unit 
tests for open project. Not to mention KeeperFX is not fully rewritten, so it 
doesn't have a proper separation of internal units.

My idea of measuring performance is still a bit vague. I want to add a command 
like:

SET_PERFORMANCE_FACTORS(<variable1>,<wage1>,<variable2>,<wage2>,<variable3>,<wag
e3>...)

Using given script variables and their wages, the game would be able to compute 
a single value of performance. Then, for example, if a test is about money, 
this could allow measuring if a code change made computer player dig more gold 
in the test map.

What do you think about this?

Original comment by mefistotelis on 10 Aug 2015 at 11:35

GoogleCodeExporter commented 9 years ago

I agree with your assessment that doing automated test like this is the only 
feasable approach. I've always had difficulty making my developers write unit 
tests and they were getting paid for it. Through level scripts there are still 
quite a few things that can be tested and no developers are needed.

In your example, how to you want to configure the variables, and what will be 
your output? Is your plan to add some variables specifically for testing to the 
game script?

Perhaps the easiest way is to keep it simple and on win/lose game and/or player 
defeat always automatically write the gameturn to the log file. Have the 
test-engineer write the win/lose condition in such a way to account for proper 
performance. (e.g. lose game on a gameturn, win game on an amount of gold).
If we run into some things we really want tested but cannot script, adding the 
script variable that would enable it would probably be a fine addition for map 
makers as well.

How do you want to get the results back? Will you create a parser to look 
through the log files for pass/fail conditions, or will you create a seperate 
log file which is created when a 'runtest' parameter is run which only enters 
on every row something like:
Map number - Map name - Pass/Fail - gameturn - error quantity - datetime

Original comment by Loobinex on 10 Aug 2015 at 1:44

GoogleCodeExporter commented 9 years ago

How to configure variables - the variables will be standard script variables; 
the command:
SET_PERFORMANCE_FACTORS(<variable1>,<wage1>,<variable2>,<wage2>,<variable3>,<wag
e3>)
would log a value to keeperfx.log which is:
LEVEL_PERFORMANCE = variable1*wage1 + variable2*wage2 + variable3*wage3
The value could even be accessible as another variable within the script if it 
is needed.

New variables - yes, if there is any value missing, new variables will be 
implemented. These new variables will work for all standard script commands, so 
for mapmakers as well.

Results - these will be written to the standard log file, I'll just make a 
simple SED script to extract the results into readable format, ie CSV-like.

Original comment by mefistotelis on 10 Aug 2015 at 2:54

GoogleCodeExporter commented 9 years ago

I don't understand 'wage' in this context. If it is the value why do you have 
to express it in the script. Could you give a concrete example of how in 
Keeperscript you'd write the 'Total Doors' of Player1, the number of knights 
belonging to Player_good and the total amount of heroes player_good controls 
when player1 is destroyed.

But I'd say lets just try it and see how it works. I think adding the gameturn 
on win/lose game to the log file by default would be a big help because this 
would be the key performance indicator on most tests.

Original comment by Loobinex on 10 Aug 2015 at 5:57

GoogleCodeExporter commented 9 years ago

Ok, here's wage explanation:

Let's assume we have a level which tests how computer player gathers gold and 
builds traps, ie. boulders. Victory condition is to have 20000 gold and 10 
placed traps.

But besides pass/fail result, we want to know the measurable performance - how 
exactly KeeperFX improved since 20 builds ago.

We could just sum TOTAL_GOLD_MINED and BOULDER - but that would mean producing 
another BOULDER has very little impact on performance. If we want those two to 
be equally important, we need to scale BOULDER with a wage, ie.
PERFORMANCE = TOTAL_GOLD_MINED * 1 + BOULDER * 2000

This way producing 10 traps gives the same performance impact as mining 20000 
gold.

Original comment by mefistotelis on 10 Aug 2015 at 6:48

GoogleCodeExporter commented 9 years ago

Ok, I understand and that should work. I have never heared the term 'wage' 
here, I'm used to calling it 'weight'.

In any case, you'll get a performance number, which says 'bigger number is 
better'. How I was thinking was record the game_turns it takes to achieve the 
victory condition of e.g. 20 boulders+20k Gold. This way you also have a 
performance indicator: Faster is better.

I think the result is basically the same, so it is up to you to estimate if the 
effort in implementing the weighted performance variables is worth it.

Original comment by Loobinex on 10 Aug 2015 at 7:52

GoogleCodeExporter commented 9 years ago

Hm, looks like I was using that term incorrectly, it should be weight.

Original comment by mefistotelis on 11 Aug 2015 at 1:00

yvsarkisyan / keeperfx

Allow automated testing #652