Docs: include list of sequences to test on

katestange commented 1 year ago

I think we should have a list of OEIS sequences with particular bug-inducing properties to test on. New visualizer PRs should be required to test on these before submitting.

These should include (but are not limited to):

all 1,0,-1s: A114592 fast growing terms: A007235 (there are plenty, this is perhaps not the fastest!) offset of -1: A000521 (maybe we can find one that doesn't grow so fast, to separate concerns) maybe A002206 A061646 A284016 includes negative terms: A134028 has very few terms: A001220 offset of 2: https://oeis.org/A086677 more examples with offset: https://oeis.org/wiki/Offsets no positive terms (includes zeroes): A002819 non-positive integers: A001489

katestange commented 1 month ago

The sequence A202319 consists of moderately sized semiprimes, which should be hard to factor for their size. But they are overall small. It would be nice to find examples of sequences that are inherently hard to factor, for testing.

gwhitney commented 2 weeks ago

These are good candidates for tests in the new end-to-end testing in #420.

gwhitney commented 2 weeks ago

How should we approach incorporating this excellent list of sequences into our automated tests? Do we want a bona fide grid of testing every visualizer on every one of these sequences? Such a test would take a very long time to run, so we would (a) not likely be able to run it in GitHub's CI, and (b) would not want to require it to be run for every single commit. Really rather it should be run at least once per PR, if we go this grid route. So then it would become the reviewer's responsibility to pull the PR locally and run the grid test before approving for merge.

Another option would be a sort of "double transversal" approach -- make sure that each one of the sequences is tested with at least a couple of visualizers, and each visualizer is tested with a couple of sequences. We could automate this, seeding the random number generator so that the same associations are always chosen, and if we ever want to do a more extensive version, someone could run it with a couple of different seeds locally.

Do either of these sound good? Other thoughts? (Basically I am looking to make testing/review of PRs as routine as possible, to reduce the burden of reviewing, enabling more effort on the development itself.)

katestange commented 2 weeks ago

This sounds like a nice idea (the double transversal with seed). Why not have the automated test do a different seed each time, though? Particularly if we have the PR submitter do more extensive testing once per PR. Maybe worth discussing at a meeting.

gwhitney commented 2 weeks ago

Why not have the automated test do a different seed each time, though?

Hmm, I am worried about having a state where the code all "works," and then someone is blocked from checking in a perfectly good PR because it happens to run with a different seed on their PR, uncovering a previously unknown but unrelated problem. I really think to avoid blowing our minds we need the tests to be pretty strictly reproducible. We could from time to time change only the seed in a tiny PR, just to mix things up and distribute what's being tested.

The other big point is what's the success criterion? I don't know of a truly automatable one. So I was going to make this an image test, which means that someone needs to vet new images any time the test changes...

So unless I hear otherwise, I will go ahead and implement the double-transversal test with a fixed seed and add that to the #420 PR, considering that sufficient to close this issue, and we can take it from there in the future.

katestange commented 2 weeks ago

ok that makes sense

gwhitney commented 2 weeks ago

Good candidate for negative offset test: https://oeis.org/A228060 -- it has an offset of -85, but also has a thousand terms that don't get particularly large.

gwhitney commented 2 weeks ago

Good candidate for large positive offset: A241298. About as large an offset as backscope can currently handle.

numberscope / frontscope

Docs: include list of sequences to test on #225