Open masak opened 5 years ago
I was thinking about this right for a few hours. I'm not entirely sure about the interface-like idea...
all consumers of
Bel_pair
go through theBel_pair
struct and its two function pointers, and know nothing else about the data layout.
Function pointers are a no-no for me. The idea is to keep the implementation as simple as possible, and function pointers are one of the aspects which I'd rather not use at all. That is pretty much the reason why this software is being written as a literate program (though I must be fair, it lacks contribution guidelines explicitly saying all of that right now).
Plus, if we store car
and cdr
in Bel_pair
as function pointers, deeper structural changes would need to be made. And not only that; since this is a literate program, explaining this "hack" on the prose parts would be... troublesome.
However, everytime I handled the car and cdr of any list, I used the bel_car
and bel_cdr
functions. How about making changes directly to those instead? This way, it would be only a matter of handling the special case on those functions.
The result of (type 5)
returning the symbol number
definitely has to go, but that can also be managed by handling the special case on the pertinent functions (which I still have to isolate. Adjustments will likely be made to bel_prim_type
, but also to internal predicates such as bel_pairp
).
A fix by changing bel_car
and bel_cdr
would involve converting directly a number to a literal, which are built-in and specified in the Bel spec:
Numbers in Bel take the form
(lit num (sign n d) (sign n d))
where the first (sign n d) is the real component and the second the imaginary component. A sign is either + or -, and n and d are unary integers (i.e. lists of t) representing a numerator and denominator.
So 2/3 is
(lit num (+ (t t) (t t t)) (+ () (t)))
and 4-1/2i is
(lit num (+ (t t t t) (t)) (- (t) (t t)))
Building the lists of t
and this "coercion into a literal" itself would be as trivial as implementing a few more functions to do the job.
Other interesting thing is that we could also handle the (lit num ...)
case in the bel_eval
function as an "internal" special form; checking for a number literal is a very simple task, as well as doing a "reverse coercion" into a proper internal number as well. Since Bel specifies that valid numbers are printed as... well, numbers, that seems to be an option.
Plus, all arguments are evaluated during a function application, so this would work fine:
→ (+ (lit num (+ (t) (t)) (+ () (t))) (lit num (+ (t t) (t)) (+ () (t))) ) 3
...that is, if the number is valid -- otherwise, we just return the plain literal list, which would be an "invalid" number:
→ (lit num blah) (lit num blah) → (+ (lit num blah) 3) Error: (lit num blah) is not a number
This seems to fit perfectly, since performing any operations in "non-numeric literals" should also be invalid anyways. Changes would have to be made to bel_numberp
and bel_number_list_p
to make that work.
xar
and xdr
would need extra attention because they have to check for the special case as well, but that's manageable: we convert, change the pertinent parts, and attempt to return a re-conversion into a number (which can fail, of course, but in that case the plain new literal is returned).
As for id
, I am not entirely sure; maybe converting all numbers to literals by default when checking for identity in bel_prim_id
would be a sane option.
There is an extra bonus for this approach: the numerator and denominator in Bel are plain integers, but they don't have to be like this in Believe
. However, if one just goes ahead and uses car
, cdr
, xar
and xdr
to mess with an "optimized" number, the conversion from "number" into "number literal" can automatically coerce the numerators and denominators into integers -- in fact, this coercion is already implemented. So we get the best of both worlds.
Function pointers are a no-no for me. The idea is to keep the implementation as simple as possible, and function pointers are one of the aspects which I'd rather not use at all.
Yes, that makes sense. Function pointers have a high complexity penalty. I understand the desire to avoid them if possible. Even from a C language perspective it makes sense; C is more geared towards static, compile-time knowledge. Making things dynamic with function pointers moves the code out of C's sweet spot. Might even have an impact on the error messages the compiler is able to give, etc.
However, everytime I handled the car and cdr of any list, I used the
bel_car
andbel_cdr
functions. How about making changes directly to those instead? This way, it would be only a matter of handling the special case on those functions.
It makes sense. By the way, this trade-off that you're deliberating is famous, and known as the expression problem — in short, do you associate differing behavior with the data (as function pointers would) or with the operations (as per your suggestion). I believe that c2 page and the one about switch statement smell is good reading in the context of this issue. (Not suggesting that there's a right answer here. That's the thing, it's a trade-off.)
xar
andxdr
would need extra attention because they have to check for the special case as well
Case in point. :smile: But it's not so bad, I think. What's interesting is the cost of doing this not just for numbers, but for some other type that can be optimized, later. But that could be premature design on my part.
As for
id
, I am not entirely sure; maybe converting all numbers to literals by default when checking for identity inbel_prim_id
would be a sane option.
Not sure I understand. Would this guarantee that if you compared a number against itself, (id n n)
, you'd always get t
?
This is the issue I hinted I would submit at https://github.com/luksamuk/believe/issues/6#issuecomment-545216093. I confess I've spent way too much time thinking about solutions to this challenge, both for this Bel implementation and for my own.
Per the guide,
In other words, we read the source for correctness. An implementation, however, shouldn't implement this directly but instead cheat creatively.
Base Proposal
Instead of directly pointing to a
car
andcdr
part, I suggestBel_pair
consist of two function pointers that can returnBel
values forcar
andcdr
. Expressed as a Java interface:C doesn't have interfaces, but the struct inheritance idiom accomplishes the same. We "program to the interface", that is, all consumers of
Bel_pair
go through theBel_pair
struct and its two function pointers, and know nothing else about the data layout.I suggest these two "implementations" of
Bel_pair
:Aggregate numbers only provide a facade that presents as a
pair
, that allows destructuring operations into it, etc. But they also hold extra data about the number itself. An aggregate number for the Bel number5
would hold anint
with the value5
.(The interface idea can also be used to optimize a whole host of other types in the language. For example, functions can carry around a precompiled optimized version of itself which can be used in most circumstances. Arrays can be allocated as a single block and provide constant (rather than linear) random access. Queues can be implemented as real queues or deques, and provide similar access speedups.)
It's possible it'd be better for there to be many different types of aggregate numbers: unsigned integer, rational, signed rational, complex — following the Bel spec itself. Here I will write of them as a single aggregate number type, for simplicity.
Hijacking the built-in numeric operations
The main benefit of this is that we can now intercept any call on operations like
+
, check if its arguments are all aggregate numbers. In that case, we can skip the slow-path spec implementation of+
and just add the numbers as Cint
s, and return a new aggregate number.This is expected to be the typical case.
Also recommended that both the Bel reader implemented in C and (later) the Bel reader implemented in Bel be convinced to spit out aggregate numbers by default.
Heterogenous operations
Despite best-effort attempts to intercept the creation of numbers and turn them into aggregate numbers, a user could always build a number "by hand", and it'd not be turned into an aggregate number.
It'd be theoretically possible to check at
join
-time, but I'm wary of suggesting this. Instead, we should just be ready for getting a mix of numbers and aggregate numbers to+
and friends. They should then "upconvert" the numbers to aggregate numbers and proceed as usual.Doing an upconversion on a number is presumably slow, since we have to traverse down ridiculously long lists of
t
s. However, something already went to the trouble of building those lists, which was also slow, and we're just adding the same type of linear const once more.De-optimization
In two situations, the aggregate number has to destroy itself and replace itself with the actual list structure it emulates:
xar
orxdr
id
None of these situations are very likely, let alone very sane. But a conforming Bel implementation must support this case. A first cut at an implementation can simply crash in protest; later ones can do the replacement. It won't be fast, but it will be correct.