Capability for inbuilt math functions, with first being `mathematician's modulus'

numberscope / frontscope

Numberscope's front end and user interface: responsible for specifying sequences and defining and displaying visualizers

MIT License

7 stars 15 forks source link

Capability for inbuilt math functions, with first being `mathematician's modulus' #48

Open katestange opened 2 years ago

katestange commented 2 years ago

We should create a place in the codebase where we can build a library of inbuilt math functions available to visualizers. So, housed in src/visualizers, e.g. src/visualizer/math.ts. (Glen notes: the file name starts with a lower case by the conventions of this project because it is not defining a class.) This file should export various useful functions. The first test case, to include when setting this up, is a ''mathematician's modulus''. The % operator in javascript will return a negative value when the dividend is negative. Ideally, (A mod B) should return the smallest non-negative integer which differs from A by a multiple of B. So, (-13 mod 5) should be 2 (not -3 as javascript would have it). In particular, the result should be in the range 0 to B-1 inclusive. The function itself would be called in the visualizer by importing modulus from math.ts, and then have syntax like modulus(A,B).

gwhitney commented 2 years ago

Note that the return type of a Sequence element has turned out to be bigint|undefined (so that missing terms can be flagged) so these utility functions need to handle inputs of that union, and just return undefined if any necessary input is undefined.

gwhitney commented 2 years ago

After some reflection on the experience of creating a module like this in #50, I would like to suggest that ultimately we fold these utilities into math.js, which we are already using. That way, anything we add can be uniformly used in formulas by users in the front-end and by visualizers in the back end, for example the powmod() function or whatever we decide to call it that computes a^k mod n. That doesn't mean we shouldn't put things in a separate math utility module for now, but it does suggest that we should as soon as feasible start using math.js for computations in visualizers. That will make things less ad hoc. (In particular, math.js modulus follows mathematicians' rules, so that solves that.) But getting that to work depends on extending math.js to handle items that are bigint|undefined so that will be my next effort as soon as #50 is resolved.

gwhitney commented 2 years ago

More details on the above comment, quoted here from Slack (because of the discussion in https://numberscopef21.slack.com/archives/C02D6G6TCF8/p1637619803031500 to move significant design conversations to Github):

After embarking on the effort to switch to bigint | undefined as the official, consistent return type of elements in Sequence objects, and as a result creating a math.ts module in src/visualizers as planned [in this issue], I came to a point of wanting to revise my recommendation. Doing calculations with bigint | undefined is as Kate says, unfortunately tricky. You have to handle the possibility of either operand being undefined in a numerical operation. So you can't just use ordinary arithmetic expressions in TypeScript (which don't accept undefined as operands). So my revised proposal [mentioned above with less detail and rationale] is to switch to using math.js in Visualizers for doing calculations. It has a consistent interface and a nice framework to add things like powmod() to, seems relatively easy to use, and most importantly, we only do work once to get a particular construct correct for bigint|undefined -- then that construct can be used in Formulas by users of the Numberscope, and by Visualizer writers, on an equal footing. We can even add things like having it automatically optimize (a^k) % n to a "powmod" function, and that should work in both contexts.

There is some work I will have to do extending math.js once frontscope issue #50 is merged, before math.js will be ready to do the Visualizer computations.

For that, it will be convenient to have a compact name for the union type bigint | undefined. I am open to suggestions. Some possibilities: "SeqInt" "Integer" "Int" "MayBig" "MayBigInt"

Sorry to be running on a bit here. There is another option that would eliminate having to deal with undefined altogether. That would be to make it official that a Sequence object only represents a contiguous sequence of integers from an initial index to a final index, and provide a method or methods in the SequenceInterface that provides those limiting indices (perhaps with the final index less than the starting index to indicate that this particular sequence is unbounded and can provide values for any index greater than the starting index). Then we can say that it is only legitimate to call getElement with a valid index. And the advantage of that is the return type of getElement could be just plain bigint rather than bigint|undefined, which would definitely simplify arithmetic operations on Sequence elements. Thoughts? Even though I just did #50, I am on reflection somewhat leaning toward this, and would be happy to scrap it and redo a version along these lines if it sounds good to folks on the project.

gwhitney commented 2 years ago

So I am not going to do further work on these directions until we've decided between these options:

(A) SequenceInterface.getElement() returns bigint|undefined for any index, which prevents the need for methods that give the first and last valid indices, and allows Sequences with missing elements in the middle, at the cost of having to provide for seamless mathematical computations on the union type bigint|undefined via some moderately intricate extension work to math.js, or (B) SequenceInterface provides methods for obtaining the first and last valid indices for a Sequence (with last less than first indicating all indices larger than the first are valid), and then is obliged to produce a value for any index in range. That way the return type of getElement can simply be bigint, and it will throw a RangeError if called with an index that's too small or too large. Then we just need to provide the ability to do computations on bigints -- that means that many of the built-in arithmetic operators will work, and extended math.js for the other ones needed will be less complicated.

Note that the OEIS itself limits the concept of a sequence to an (arbitrary) range of contiguous indices (starting from some specific integer), which corresponds more closely to (B).

And I won't consider the decision to have been taken until there's feedback and some sign of consensus from folks on the team. So you if you have any thoughts, questions, or votes, please comment here.

gwhitney commented 2 years ago

OK, I have now also filed #51, a reference implementation of option (B) in the last comment, for the sake of comparison to #50 which implements option (A). Hopefully they will be helpful in deciding which way to go.

Although it did seem more convenient in #51 to be able to depend on the return value from getElement to be a BigInt (and not undefined), as I was working on it a serious difficulty with approach (B) occurred to me: If one makes an involved formula for a sequence that perhaps depends on multiple OEIS sequences, perhaps with shifted indices or the like, it could be essentially impossible (like the halting problem) to determine ahead of time what the smallest and largest valid indices are -- it might require simply searching through thousands of indices.

Hence, I am back on the side of recommending option (A) even though it is unfortunately somewhat more awkward for writing a Visualizer than option (B). (I think the way to deal with that is just to get math.js to be adept at dealing with computing with bigint|undefined items and then use math.js consistently.) I just don't see how to implement option (B) in generality.

However, I definitely will not recommend #50 for review/merge or proceed further with this direction until others have had a chance to provide their thoughts and reach a consensus -- maybe there is another way of looking at option (B) or a variant of it that would be workable. Thanks for considering this point and letting us know what you think so that we can move on from here!

katestange commented 2 years ago

Glen pointed out a sequence could have a very complex formula involving several OEIS tags, some math functions, etc. It's possible this sequence itself could also take a long time to compute, right? So we encounter something a bit like factoring sequences: we may want data that takes a while to compute. If we go compute the whole sequence and store it in the database, then have the visualizer pull from the database-stored version, we won't have any issues with valid range of indices, and could implement #51 (unless the sequence is "spotty" and has gaps?). So it's a larger question: how do we want to treat formulaic sequences? Do we want to precompute them or attempt to compute them as indices are requested? There are problems with computing on the fly anyway, irregardless of #50 vs. #51.

gwhitney commented 2 years ago

In principle, Kate's suggestion could work. When you enter a formula in the front end, it could check the database for values of that formula, and return the range of valid indices and the formula values if it finds them, and if not kick off a process on the server to fill in that information and return it when it's done. The primary obstacle is indexing: you'd want n^2+n and n+n^2 to hit the same database entry, for example. However, there are definitely approaches to tackle the indexing problem: simplify the formula (via an existing method in math.js) and choose a deterministic ordering of subterms and then convert the formula to an easily processable linear string representation (like "reverse Polish notation" or something), and use that as a database index.

So, this plan would come down to accepting fairly significant technical complexities on the Sequence side, for the sake of the conceptually and technically simpler option (B) (reference implementation #51) for Visualizers. Since I think the conception of Numberscope is that most contributors would write Visualizers and so we want to make adding them as straightforward as possible, that tradeoff might be the right one.

Kate, do you want to just make the call that we should head down this path, or wait for more input from other folks and/or wait until there's a chance to discuss this at a meeting?

In any case, thanks so much for this concept of how option (B) might be workable in practice.

katestange commented 2 years ago

A small note that the chaosvisualizer (#59) implements its own mathematician's modulus at the moment; when this issue happens, that vis should probably be changed to use the newly provided modulus instead. Such a function should return with the same type as the modulus, probably (if modulus is an integer number, we can mod a bigint by that number and get a number); at least that's what the version in the chaos vis does.

gwhitney commented 2 years ago

Kate and I had a long discussion about this today. We ultimately decided to go ahead with option (B). The primary considerations were:

The OEIS itself uses the model implicit in option B: a sequence starts at a specific index and then every term is present thereafter, possibly up to a final index.
Although it will likely significantly increase the intricacy of writing a Sequence implementation that unifies OEIS sequences with operators, and involve the backend in that as well, it should typically simplify the process of Visualizer writing, which is at a premium in this project.
Reflecting in more detail on that "typically," it seemed that for sequentially processing the values of a single Sequence in a Visualizer, it's definitely nicer to be able to begin at the beginning, and then definitely get a value that one can do ordinary arithmetic with, until one perhaps reaches the end. And from experience, that seems to be the bulk of Visualizers. It seemed that if a Visualizer did random access on a Sequence or Sequences, there really wasn't a difference between checking bounds every time and checking whether the return value was defined every time, so that was a wash on Visualizers that would work that way. The only case where it might be simpler from the Visualizer writer's point of view to get back undefineds is when processing a whole bunch of Sequences in some way that interrelates their values, you might need to do a somewhat cumbersome precomputation of iteration limits. But we don't actually have examples of this yet.

So in any case, option (B) it is. I will close the option (A) PR and update the option (B) PR, flagging it when it is ready; and once that is in, I will likely next try to resolve this issue so we can start adapting math.js to handle BigInts.

gwhitney commented 1 year ago

This issue started (more or less) with a need for "powmod". That need still exists. It will eventually be folded into mathjs, but that is (unfortunately) a long ways off. And I think the need for "powmod" goes beyond just needing it in visualizer functions; I think we also want to allow it in formulas. So I am going to open a specific issue for this aspect, and leave this as a placeholder for mathjs adoption and the interface to sequences that was worked out here (but is not yet fully realized, and probably won't be until mathjs is overhauled).

katestange commented 1 year ago

Thanks, Glen! I was actually going to ask you about powmod on the call (that's what I had in mind anyway). So this is excellent.

gwhitney commented 1 year ago

Thanks. If you can read and respond to #172 that would also be great.