Simple python-based testing framework for beebasm

mungre commented 3 years ago

Cheers! Thanks for giving it a go.

The expected output thing seems important. I'll add support for .gold.txt files, which hold a subset of the expected output. i.e. the contents of the .gold.txt file has to appear somewhere in the output of the test. Once the test runner is capturing the test output it will be simple to add a verbosity option.

I didn't worry about testing for python because, as you say, if someone doesn't have it they can use make code, and it is always installed on github runners so I'm not worried about it breaking the automatic builds. I know this is a retro-computing project, but python3 is thirteen years old and python2 is EOLed! Though having written that I tried python2 and it works. :-)

Running the tests creates some output, which clutters up the git status. The code already knows the name of any ssd being created so it can delete that. At the moment, that just leaves the "test" file and that can be deleted as a special case.

I'll move the existing tests into a subdirectory as well. I was sort of trying to replicate the examples directory but I'm not sure how useful it is for end users and I'd like to organise the tests a bit.

ZornsLemma commented 3 years ago

Thanks!

At the risk of stating the obvious, would it be worth just adding some carefully chosen entries (like "test") to .gitignore? It's still nice if the test harness cleans them up, but sooner or later it is going to be CTRL-C-ed by a frustrated developer when something isn't working and it would be nice to minimise the inconvenience of having some files left lying around.

It might even be worth (I must admit I haven't checked precisely how this currently works) having all the temporary files generated in a "tests/tmp" directory, then we could just add that one directory to .gitignore.

I'm glad it worked on Python 2. I don't have strong views on Python 2-v-3, but if it's easy to have it work on both it seems a shame not to. On the flip side, if there's something in Python 3 which is really helpful we should probably just go ahead and use it.

Seems like a good idea to re-organise the tests. Once that's done it might also be worth removing some files from the examples directory - some of them are doubtless useful as examples, but others (e.g. the errorline* ones) I really just dumped in there for want of anywhere better, they are really test cases and nothing else.

mungre commented 3 years ago

I considered using an output directory but beebasm doesn't have an option for this, so it would have to be set in the test files using relative paths, which would add a complication with tests living in subdirectories (and it would stop the test being compiled into an ssd, not that it matters). Or there could be a separate output directory for every subdirectory. That might improve things a bit.

I discounted gitignore because the output ssd files all had different names. However, there was no reason for this. So they are all renamed to test.ssd now, and gitignore ignores them. With this change I'm not so fussed about deleting them.

The errorlinenumber1 test produces line numbers at the top and the bottom so the .gold.txt file is the entire output. It may be better to treat each line of the gold file as a separate pattern and test that they all appear in the test output in order. But I won't change this until there are more tests to get some idea of what's really required.

I've tried to put the tests in some sort of order from simpler things up to more complicated things. And I've added some simple tests for all the operators.

ZornsLemma commented 3 years ago

Thanks, this is looking really pretty slick now! One thing I did notice is that the cmake workflow builds don't show verbose test output; would it be easy to change that?

mungre commented 3 years ago

Cheers! cmake now shows verbose test output when tests fail. I've tried this locally, but I haven't pushed a failing test. I don't know if it's possible to get verbose output all the time.

mungre commented 3 years ago

I pushed a bad test to a new branch and cmake is logging the test output.

ZornsLemma commented 3 years ago

Thanks. I've pulled this code and had a bit of a play with it locally and it all seems to work, FWIW.

Apologies if this seems nit-picky - I don't really think it matters, but I thought I should at least ask - why is there a difference between the github workflow for make (where we force verbose output from the tests all the time) and CMake (where we force verbose output if there are errors)? (I think that is how it works, but let me know if I've got confused).

I think it would be good to add a gold ssd or txt file (txt will only work if the output of beebasm is verbose) for local-forward-branch-5.6502; as noted in the comment at the top of the file, this test case was at one point assembled incorrectly. (There was a bug in one of my changes :-( .) I'm happy to do this if you like, although I'd appreciate hearing your thoughts on ssd vs txt first. (But feel free to do it yourself if you wish, of course!)

I'd suggest we remove the following from the examples directory now:

basicrhstoken*
errorlinenumber*
invalidbasic*
localforwardbranch*
put*nonexistent*

ZornsLemma commented 3 years ago

PS It might be nice if we could include the "rotating sphere" demo/example code as a test case - it is a moderately substantial bit of code compared to the other test cases. But this would probably require duplicating the code in the tests subdirectory, which feels a bit unsatisfactory.

mungre commented 3 years ago

Cheers.

As I said above, I don't know how to persuade cmake to produce verbose output in the non-error case.

On txt vs ssd. Any changes to improve the beebasm output will break any txt tests that rely on it. ssd tests don't suffer from this. On the other hand, ssd test failures are harder to diagnose, though it's simple enough to compare the verbose output of old and new beebasms manually.

It should be simple to include the sphere demo as a test using INCLUDE. Maybe in directory 6-projects?

Agree on the examples.

I'm happy to make these changes but it won't be before Monday. Feel free to go ahead!

ZornsLemma commented 3 years ago

Thanks. I've made these changes on the ZornsLemma/testing branch (which branches off your mungre/testing branch); if you're happy with them please merge them to your branch. I went with a gold ssd for both cases; this was a bit arbitrary, although it's perhaps nice to have some examples of gold ssds in the repo (everything else is using txts).

I had a quick look at the cmake/ctest stuff. It looks to me as though https://cmake.org/cmake/help/latest/variable/CMAKE_CTEST_ARGUMENTS.html would do what we want - we'd presumably add set(CMAKE_CTEST_ARGUMENTS -VV) in CMakeLists.txt - but this is new in 3.17 and at least on my local Ubuntu 20.04.2 install cmake is 3.16. It might actually be ugly to change this anyway, but maybe that's just sour grapes on my part. :-)

ZornsLemma commented 3 years ago

I've also added a gold ssd for basicrhstoken on my branch; this used to succeed but generate bad tokenised BASIC. FWIW this case has to be tested via a gold ssd, because the verbose output doesn't show the results of PUTBASIC.

Do you think we should add gold files for all tests? I don't think the remaining ones are so critical, but we would get more validation out of the existing tests if we did. (But unless we use gold ssds, we run the risk of making loads of tedious work - or at least sed-ing :-) - if the beebasm text output is slightly tweaked at some point.)

mungre commented 2 years ago

Thanks, that's great. There was just one issue. The sphere demo test failed on Windows because the random number generator gave different results. It used longs, which are 64-bit on 64-bit Linux but 32-bit on 32-bit Linux and all kinds of Windows. It's using ints now which I think are 32-bit everywhere we care about.

Local "make test" is quiet by default, but you can make it verbose. The test on github is verbose by default so the failing output doesn't get lost. This is set in the .github\workflows\actions.yml file. The cmake CTEST_OUTPUT_ON_FAILURE option is also set in actions.yml. The -VV option could go in there, and being github-only it wouldn't cause problems if people have older cmakes at home.

Most of the failure tests and some of the simple assert tests I added don't produce any output, but ssd tests for everything else are a good idea. I'll do that tomorrow.

ZornsLemma commented 2 years ago

Thanks! The RNG problem is quite embarrassing since the whole point was to generate consistent results. (It used to drive me nuts that every rebuild changed the demo.ssd file which is committed to the repository; it made merges unnecessarily painful.) I've skimmed the Wikipedia article (https://en.wikipedia.org/wiki/Lehmer_random_number_generator) on this again but I must admit it's making my head spin a little bit. I'm sure your solution is fine in practice anyway.

ZornsLemma commented 2 years ago

PS Since the RNG code isn't independent of the size of the integer type used, despite my naive intentions that it should be, I wonder if we should use something like uint32_t instead of unsigned int. That way if there is a platform where int isn't 32 bit, it will still work, and we've semi-documented the importance of the precise size of the integer type.

mungre commented 2 years ago

Red faces all round! Your comment prompted me to have another look and my fix was consistent but wrong. The latest one produces the same results it used to on 64-bit Linux, and Windows is consistent with it. I've used uint_least32_t and uint_least64_t to be completely explicit about the requirements.

mungre commented 2 years ago

Further to my last commit, I'm not sure what local-forward-branch-3 is testing.

I've been adding some simple tests of value parsing and noticed these things:

I expected print to write 32-bit integers cleanly but print(&7fffffff) produces 2.14748e+009. Can you see any harm in changing this (for print and STR$)?

I was surprised that &ffffffff isn't negative, but it's always been like that so it's too late to change it. However the % parsing can produce negative numbers, and quietly truncates the value to 32 bits (undefined behaviour notwithstanding). The silent truncation can go but I'm not sure about making the negative behaviour consistent with the hex parsing. What do you think?

(This comment originally added to the mungre/strings pr.)

ZornsLemma commented 2 years ago

I can't figure out what local-forward-branch-3 is testing either. I haven't tried bisecting this fully, but it does build just fine with the commit immediately before the test case was added, and with v1.08 as well. If I had to guess I'd suspect there was a problem with beebasm using abs instead of zp addressing in macros at some point (probably as a side-effect of some other change I made regarding scoped variables), but I really can't be sure.

I think we should probably remove all the local-forward-branch-* tests from the examples directory (as I did on ZornsLemma/testing). My gut feeling would be to keep local-forward-branch-3 as a test just in case, but if you want to get rid of it I won't be too upset.

(Is the old retro software forum still around? I know it's superficially present on stardot, but I was trying to find some beebasm posts I made about structured assembler way back and I couldn't find them on there, so I'm not sure it's complete. If it is about, there might well be a post from me on there about local-forward-branch-3.)

I can't see any obvious downside to changing the way print displays large integers, but of course it's hard to rule out someone using this in a way that will get broken.

I hadn't really thought about & and % generating negative numbers until now. It doesn't sound like a bad idea to make % consistent with & but of course someone could have written some code which depends on it; my gut feeling is that isn't very likely but I really don't know.

Perhaps the best suggestion I could offer would be to make it consistent and post to stardot mentioning it in passing; probably no one will care, and if they do at least you haven't sneaked an incompatible change in without mentioning it.

(This comment originally added to the mungre/strings pr.)

ZornsLemma commented 2 years ago

OK, I found this about local-forward-branch-3 FWIW: http://www.retrosoftware.co.uk/forum/viewtopic.php?f=17&t=994&p=7797&hilit=local+forward+branch#p7797 (posts above and below that post may also be relevant)

mungre commented 2 years ago

Cheers, I've added a comment to local-forward-branch-3.

The local-forward-branch-* tests from the examples directory seem to be gone.

I think that's everything except the binary number parsing.

ZornsLemma commented 2 years ago

Thanks, I think I had forgotten to pull the latest mungre/testing changes to my local git repo so I couldn't see the removal of local-forward-branch-*, even though I had an idea you'd merged my commit removing them earlier.

ZornsLemma commented 2 years ago

The test suite is getting impressively large now! I've just downloaded and built the current mungre/testing branch and it works fine - I doubt that's a surprise given the automated tests are passing, but FWIW.

mungre commented 2 years ago

Cheers! There is an awful lot of repetition in the parsing code and I've toyed with the idea of simplifying it, but I don't want to do that before tests are in place. If I get round to it at all. But there's not much going on in this heat.

mungre commented 2 years ago

The build runs on: [push, pull_request]. When pushing to a pull request, the builds all run twice, so there are 20 rather than 10 successful checks. We're not paying for it, but it is a waste. I'm inclined to change it to run on push only. Worst case, someone manually creates an untested pr and it gets pushed to proposed-updates or master with no edits, but it will get tested then.

ZornsLemma commented 2 years ago

Changing to run on push only seems sensible, although I'm not sure my opinion counts for much here. I wonder if Dave Lambley (who originally added the github actions support) is reading this and has an opinion. Do you think it might be worth dropping him a mail?

mungre commented 2 years ago

I just pushed a change to #60, which is a pull request from a separate fork because I didn't want to clutter up this repository with spam releases from testing the auto-release. Anyway, it's changed my mind. When the pushes are happening on a separate repository the push tests happen there, but the pull-request tests happen here and they aren't necessarily the same tests. If we lose the pull-request tests then no testing would happen here. So I'll leave it alone.

Edited to add: it seems like the push tests use the tests in the branch being pushed but the pull-request tests use the tests in the base branch.

mungre commented 2 years ago

I bit the bullet and wrote a helper class for the argument parsing for directives. The class is a bit complicated but the rest of the code is greatly simplified and it does reduce the lines of code.

And it fixes a bug with undefined symbols in HandlePutFileCommon.

ZornsLemma commented 2 years ago

I finally got round to taking a look at this - it's really neat! I don't pretend to have followed the logic of the argument parsing through step by step, but the same was true of the old code too. :-) The improvement in readability of the functions implementing the various directives is amazing.

As you say, having the tests in place makes this sort of refactoring a lot less risky and I'd suggest the tests have already justified their existence just by enabling this change. To get a bug fix out of it too is just the icing on the cake!

mungre commented 2 years ago

Thank you! I forgot to say that it improves the error reporting a little as well. Each parameter remembers its starting column so the error indicator reliably points to the start of a failing parameter.

Anyway, I'll go ahead and merge these prs into proposed-updates. Do you have any preference among merging, squashing and rebasing? I'll just merge them if not.

ZornsLemma commented 2 years ago

Good news if it improves error reporting too!

I'm no git/github expert, on a project this size I'd be inclined to just merge - that's what I've been doing - but it's up to you really.

Cheers.

Steve

On Mon, 9 Aug 2021, 20:10 mungre, @.***> wrote:

Thank you! I forgot to say that it improves the error reporting a little as well. Each parameter remembers its starting column so the error indicator reliably points to the start of a failing parameter.

Anyway, I'll go ahead and merge these prs into proposed-updates. Do you have any preference among merging, squashing and rebasing? I'll just merge them if not.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stardot/beebasm/pull/57#issuecomment-895471261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46IEEMWWY7PXCFWOGLMDTT4AR2JANCNFSM475PSHRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

stardot / beebasm

Simple python-based testing framework for beebasm #57