General cleanup of tests (remove unused reference images, remove test cases that don't exist in resvg, add all remaining resvg test cases without reference images)
Instead of using a custom test runner, we generate tests using Python. While it removes some flexibility in how we run the tests, it allows us to use cargo's native test runner, which has many advantages, such as being more consistent overall and also not quitting instantly if a test case panics.
This PR makes the following changes: