kobaltcore / renkit

A collection of tools to help you organise and use Ren'Py instances from the command line. Especially useful for headless servers.
MIT License
26 stars 2 forks source link

Potential for automated testing #2

Closed darkplayground closed 10 months ago

darkplayground commented 2 years ago

I'm relatively new to renpy and it's clear you are ahead of me quite a bit on the automated-build system development. Have you given any thought to an automated test platform? It seems to be the one thing renpy lacks. Some way to run through all the screens and choices and ensure there are no failures. Grammar and spelling checks might be out of scope for this but at the very least we should be able to build something that goes through the choice tree with the launcher and reports back if there were any rollback or code errors in renpy.

Any ideas?

Also, adding lint checking to renconstruct would be nice. I do it manually now but going into the installed renpy folder and running lint using the renpy.sh script.

kobaltcore commented 2 years ago

Yeah, that is indeed something I've been thinking on every once in a while since I've run into a similar thing a couple of times before. Imho a way of running tests would be better served mainlined into Ren'Py proper so it's standardized for everything, but I'm not opposed to a custom solution, either. I might work towards something to PR to Ren'Py eventually, but the architecture doesn't really lend itself that well to testing things in isolation. Still, some kind of unit testing framework would be useful indeed.

As far as running through games automatically goes, Ren'Py kind of has facilities for this in one particular "kiosk mode" function that randomly selects choices as it progresses through the game. However, it doesn't keep track of any state and is easily stumped by anything that's not a conventional choice menu. Implementing game branch traversal in a robust and universal way is extremely difficult due to the fact that choices and jumps can occur in a number of custom ways you will never be able to account for. This can lead to a state where this kinda test will "sometimes" work, but some actions on the side of the developer will make it stop working seemingly at random, or exclude large swathes of content that is actually accessible.

The lint idea is good, I can certainly add that as a pre-flight check so it aborts early if that happens. Generally, in a CI setting, you may prefer to run the lint as an extra stage though, which is already possible by invoking renutil with the proper arguments.

darkplayground commented 2 years ago

As far as running through games automatically goes, Ren'Py kind of has facilities for this in one particular "kiosk mode" function that randomly selects choices as it progresses through the game. However, it doesn't keep track of any state and is easily stumped by anything that's not a conventional choice menu. Implementing game branch traversal in a robust and universal way is extremely difficult due to the fact that choices and jumps can occur in a number of custom ways you will never be able to account for. This can lead to a state where this kinda test will "sometimes" work, but some actions on the side of the developer will make it stop working seemingly at random, or exclude large swathes of content that is actually accessible.

Instead of doing it from a unit-testing perspective of running it from the inside, my thought was to use a selenium-like approach and have a system that could detect the buttons that pop up and click them, keeping track of the choices selected and running through the entire game multiple times with different choices chosen each time. Obviously this would take longer and longer as the complexity of the game increased but I think it's the only way to get close to a 100% guaranteed "good" state for basic choice detection. And of course, more complex states could still generate errors, custom names/relationships and other things like inventory, relationship scores, etc could still generate errors but so many games have issues with just running through the basic choices I think the ability to fix that would be a great first step.

This would allow us to support pretty much any choice-based system a dev could come up with because at the end of the day, our system would just be clicking buttons on the screen like any player would. What do you think the feasibility of this is? Is It possible to emulate the launcher in this way?

The lint idea is good, I can certainly add that as a pre-flight check so it aborts early if that happens. Generally, in a CI setting, you may prefer to run the lint as an extra stage though, which is already possible by invoking renutil with the proper arguments.

Yea I have it running as a step before the build stage. I didn't know you could run it using renutil, I'll definitely look at that, that would make it much more streamline and easily support different versions of renpy. Thanks for that.

kobaltcore commented 2 years ago

Regarding linting, the invocation will likely look something like this:

renutil launch -v 7.4.11 -da "path/to/game lint"

renutil can pass arbitrary arguments to the underlying Ren'Py instance using -a (hence the quotes around the arguments) so you can easily call any command that Ren'Py supports on a project-level, even custom ones (though how to create those is not documented, as far as I know).

As far as the integration testing stuff goes, a "player-like view" approach also crossed my mind, but is hindered by two things:

  1. You can't run Ren'Py games in a headless environment because they require something to draw to. This makes this approach entirely unfeasible for any kind of CI usage. Since this is an architectural issue, we will not be able to get rid of this. The only workaround is to run on something with a video output, which severely restricts the amount of systems you can run these tests on.
  2. Custom GUI's. Menu buttons have a high chance of looking different for every game. Things like AutoIT can look for matching patterns, but this is impossible if the pattern is arbitrary and varies per game. Finding a clickable surface is also not easy because Ren'Py is more of a game than an "application", so all an external program would see is an empty window with a canvas in it, but the interactive elements wouldn't be detected.

Because of these reason I believe an internal approach would be much more effective, as it at least solves 2.), although 1.) still remains an issue.

I have also seen potential in another approach which is more akin to static analysis and doesn't require running the game at all, that being analysis of the conditionals and jumps inside the script by parsing the AST in some way. A player of our game actually created a software to parse and chart arbitrary Ren'Py script as ggplot diagrams, so this is feasible, if a lot of work. He also did it by parsing the script manually, which is very error prone and can easily break when Ren'Py updates. A better approach may be to use Ren'Py's existing parser to analyse the full AST. This would allow you to find unreachable sections and impossible choices etc more reliably, but would not be able to account for runtime errors.

darkplayground commented 2 years ago

I don't know how familiar you are with selenium but it's basically a headless browser runtime. It supports nearly everything a regular browser does depending on the browser engine you use (webkit, chromium, etc), it will even take screenshots of the browser window when an error occurs. So there's definitely a way to run a game headless, it's been done before.

As for the custom gui issue, I think we might be a bit confused here. I'm not talking about a one-size-fits-all automated testing platform that "just works". That type of thing only really works at the static analysis level anyway regardless of the language. I'm talking about creating a test framework that devs can use to write tests for. And since they will be writing the tests, they'll know how to address their gui's. There has to be some kind of "language" communicated by renpy to the OS to describe a clickable surface. I don't know how easy it would be to hook into that but it's certainly not impossible.

I think with a static analysis and linting we can get 90% of the way towards the goal of an error free basic run. It obviously won't catch runtime bugs in games with complicated logic where emergent gameplay and paths are possible but It's a step in the right direction.

kobaltcore commented 2 years ago

Yeah, I've used Selenium for several projects before. However, Ren'Py is in no way similar to Selenium or any of its web drivers, so you can't assume that just because Selenium can run headlessly, Ren'Py can as well.

Launching Ren'Py headlessly is possible (renutil supports this and renconstruct uses this internally to build the projects) but launching an actual project with a headless Ren'Py instance will crash with an OpenGL error because there is no drawable surface. It was never built with this in mind and support for running headlessly does not exist as of yet. Unless somebody does the legwork of implementing this properly and upstreams it (though I'm not sure that's even possible with Ren'Pys current state), this is a no-go for now.

For the GUI stuff, yeah, having the tests be written by the developer would solve the issue with custom UI elements.

There has to be some kind of "language" communicated by renpy to the OS to describe a clickable surface.

Problematically I would think that this differs for every platform. macOS for example would only recognize things as a proper GUI element if the application is native and uses UIKit. For Windows I assume this would be a similar story, and Linux is probably a hot mess of N+1 competing standards, heh.

I do agree that static analysis is probably the most reachable first goal here and would likely cover and detect a good chunk of common errors, so would be an overall net positive on the ecosystem. I may attempt a spinoff project in separating out the parser code from the engine to get a tool that can parse, generate and subsequently dump Ren'Py script AST's for further analysis, that might be a good starting point.

Another big item I would really like is a working (possibly opinionated) formatter, which could also make use of the dumped AST, similar to what Python has in black or go in gofmt.