jsoftware / jsource

J engine source mirror
Other
645 stars 91 forks source link

stale rank caching #142

Open rdm opened 1 year ago

rdm commented 1 year ago

https://mlochbaum.github.io/BQN/commentary/why.html#named-functions at the bottom illustrates an old bug in J:

   g=:*
   f=:g
   g=:|:
g b.0
_ 1 _
   f b.0
0 0 0

Or, for a more amusing illustration of this issue:

   g=:*
   f=:g
   g=: |: * *@#@$
   f i.3 3
0 3 6
1 4 7
2 5 8
   f f i.3 3
0 1 2
3 4 5
6 7 8
   f@f i.3 3
0 0 0
0 0 0
0 0 0
moon-chilled commented 1 year ago

I do not think this is a bug; you might rather call it a quirk. Tacit definitions are expected to be closed, but exactly what they close over might be other than what you would expect. Currently, verb definitions are late-bound, but verb ranks are early-bound. I don't see a way around this. To see why, consider the following:

c=: {{
 echo 'hi'
 if. {.u b.0 do. u@:v else. u&:v end. }}
g=: -
a=: f c g
f=: +

In order to get the behaviour you desire and be consistent, it would be necessary to re-apply c (including multiple evaluation of any embedded side effects) here, which is clearly undesirable.

By the by, discussion of f. there is also not quite right; better to use u./v. in modifiers.

rdm commented 1 year ago

I see {{ if. {. u b. 0 do. ... end. }} as a different issue:

Adverbs and Conjunctions evaluate "early". Changing a part of a definition after it was built by a conjunction should be clearly within the realm of the programmer's efforts.

But if rank is not a part of the verb's definition, that's at the very least something that should be clearly documented. But reading through the historical treatment of rank and copula, neither mention anything which would suggest this issue. And yet, this is a very old issue. (So, it seems to me that this behavior was unintended and overlooked -- a "bug".)

Put different, this is very difficult to describe. In the above examples, where f has the definition g, talking about why f has the definition g but the rank of f is different from the rank of g. It's inspectable, but it's difficult to talk about.

Also, in my experience, when it's easier to change the code to make it easy to describe than it is to fix the documentation to clearly describe the code, it's best to fix the code.

moon-chilled commented 1 year ago

Perhaps my point will be clearer if I make it differently. Maybe consider this example instead:

g=. -
a=. f@g
b=. g
g=. |.

Under your proposed semantics, a will 'cache' the rank of g, but b will not. Why this inconsistency?

Notionally, a combinator is a higher-order function which does not 'look inside' of its function arguments, but only applies them, variably. While modifiers in j are not required to be combinatoric, they often (almost always?) are so in practice. This combinatory behaviour is what enables coherent abstraction and makes late binding work.

However, if we look at the built-in modifiers, we see one particular category of counter-examples: the rank-preserving conjunctions—: :. & &. @ (forgive me if I've missed any; I've intentionally discounted u"v). Though these do 'peek inside' of their argument verbs, the only mote of early binding that they do is to find those verbs' ranks. In keeping with these (and considering the fundamental role these modifiers play) I find it most coherent for verb rank to always be early-bound.

But if rank is not a part of the verb's definition

It is part of the verb's definition! Above, I define b; that definition for b includes a rank. Why when I redefine g should b change? I have not changed the definition of b; only of g.

But reading through the historical treatment of rank and copula, neither mention anything which would suggest this issue. And yet, this is a very old issue. (So, it seems to me that this behavior was unintended and overlooked -- a "bug".)

It is often difficult to predict what the consequences of some design choice will be. FWIW, my clean-slate solution is to get rid of verb rank entirely—while useful, it does not bear its weight, especially given it is so heavy as to cause troubles like these. (This, along with changes to the way name lookup is done, would resolve all the problems brought up by your link while sacrificing none of the advantages of late binding.) But this hand is dealt, and I find the current behaviour the most coherent and consistent one.

in my experience, when it's easier to change the code to make it easy to describe than it is to fix the documentation to clearly describe the code, it's best to fix the code

I do not think the implementation should be anything more than a distant concern, while writing the specification. (IOW, it doesn't matter how easy it is to change the code, so long as it can be done at all.)

rdm commented 1 year ago

Except, I did not propose any semantics.

I shall do so now:

I propose that when the definition of a name is changed (or is newly marked as stale), that all definitions which tacitly include that name be marked as stale, so that the new definition is applied consistently in all contexts. Stale definitions must be made consistent with the definitions used in referenced names before they are used (and when this update happens, they are no longer stale).

This "mark as stale" mechanism is only relevant where (and should only be applied where) information about the definition of a name is cached in some other tacit definition.

(Also, it's the programmer's responsibility to re-run any relevant explicit definitions when they perform computations which depend on a tacit definition. If we gave programmers tools for managing explicit dependencies and triggers that would be something new, and outside the scope of this proposal.)

Meanwhile:

It is part of the verb's definition! Above, I define b; that definition for b includes a rank. Why when I redefine g should b change? I have not changed the definition of b; only of g.

g was the definition you had provided for b.

moon-chilled commented 1 year ago

I did not propose any semantics

Fair enough. I assumed them, apparently incorrectly. But I find your proposal even more complicated to explain.

What is this notion of 'staleness'? It would seem to me that your proposal would have c applied multiple times in my original example. If so, why? If not, why do you privilege some forms of 'caching' (I do not think the use of that word is appropriate here, but w/e) over others?

rdm commented 1 year ago

It's basically just cache invalidation.

moon-chilled commented 1 year ago
  1. What sorts of definitions are eligible for caching?

  2. What happens when a 'cached' definition is invalidated (what is it that is 'made consistent' with what)?

rdm commented 1 year ago

The current evidence I see suggests that the rank of definition which was associated with a name is cached in at least some tacit definitions. (But I am not completely sure what you are asking here: were you suggesting that you had not noticed this issue, or were you asking me whether other things are cached in the context of tacit definitions? If the latter, I would note that future implementations might be caching information which is not currently being cached.)

When a cached tacit definition is invalidated, this would mean that the names used in that tacit definition must be dereferenced, to get the details of their definitions (some of which might then be cached for future evaluations of that tacit definition) during or before the evaluation of that tacit definition.

HenryHRich commented 1 year ago

Reasonable.

It would apply only when non-nouns are assigned, which is seldom. It could be implemented by traversing every locale to recalculate the rank of every verb whenever any non-noun is assigned.

[Faster would be to chain verbs off of the verbs that they depend on, but that would swell the verb definition to another cacheline.]

I vote against the proposal.

hhr

On 9/3/2022 7:16 PM, Raul Miller wrote:

Except, I did not propose any semantics.

I shall do so now:

I propose that when the definition of a name is changed, that all definitions which tacitly include that name be marked as stale, so that the new definition is applied consistently in all contexts.

— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/142#issuecomment-1236211973, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKVAJ2LHW37RCI6EHJ3ELTV4PL6FANCNFSM6AAAAAAQD4TVPY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

rdm commented 1 year ago

I guess I was thinking of chaining verbs off of verbs they depend on.

I don't think cachelines very much matter here, because this chaining data structure would only be referenced when verb definitions were being updated. (Or were you recommending against the proposal for some other reason?)

moon-chilled commented 1 year ago

But I am not completely sure what you are asking here: were you suggesting that you had not noticed this issue, or were you asking me whether other things are cached in the context of tacit definitions? If the latter, I would note that future implementations might be caching information which is not currently being cached

I am aware of the following issue: if one reassigns a verb which was previously referred to by name in a tacit definition, and if this changes the rank of the verb, the resultant behaviour may be nonintuitive. I think this is also the issue you were referring to.

I find the use of 'caching' somewhat strange here, because it seems to prescribe a particular implementation strategy, whereas what I am interested in is the semantics and the behaviour, not the particulars of the implementation; but it's the word you've used, so I'm rolling with it.

I want to know exactly what the behaviour is that you're after, because it's not quite clear to me. Presumably, in my second example, when g is reassigned, you would like the rank of b to change with it; yes? Would you also like the rank of a to change? Would you like for c to be re-applied in the first example?

(My opinion is that if the answer to all three of these is not the same, then the policy is inconsistent, and that an answer of yes to the third question is too horrible to countenance, so the answer to all three should be no.)

moon-chilled commented 1 year ago

It would apply only when non-nouns are assigned, which is seldom

And at that, only when verb ranks change, which should be even rarer.

rdm commented 1 year ago

But I am not completely sure what you are asking here: were you suggesting that you had not noticed this issue, or were you asking me whether other things are cached in the context of tacit definitions? If the latter, I would note that future implementations might be caching information which is not currently being cached

I am aware of the following issue: if one reassigns a verb which was previously referred to by name in a tacit definition, and if this changes the rank of the verb, the resultant behaviour may be nonintuitive. I think this is also the issue you were referring to.

I find the use of 'caching' somewhat strange here, because it seems to prescribe a particular implementation strategy, whereas what I am interested in is the semantics and the behaviour, not the particulars of the implementation; but it's the word you've used, so I'm rolling with it.

What term would you describe this behavior which is less strange than "caching"?

I want to know exactly what the behaviour is that you're after, because it's not quite clear to me. Presumably, in my second example, when g is reassigned, you would like the rank of b to change with it; yes? Would you also like the rank of a to change? Would you like for c to be re-applied in the first example?

I am puzzled here. On the one hand, you describe the current behavior as "perhaps nonintuitive". And you also seem to reject my summary description of the current behavior as stale cache.

So, I'll try phrasing my point of view using different words:

J implemented "late binding" for verbs, so that verb definitions could be supplied after the verb names were used in definitions. I think that the rank of the verb (which is a part of the definition) should be respected when a definition is associated with a late bound name.

When this should have further effects (when the definition itself uses a combinator which is sensitive to the rank of the verb), then yes the updated verb rank should still have this effect.

(My opinion is that if the answer to all three of these is not the same, then the policy is inconsistent, and that an answer of yes to the third question is too horrible to countenance, so the answer to all three should be no.)

If by "second example" you meant

g=. -
a=. f@g
b=. g
g=. |.

I do not understand what is "too horrible to countenance". Is that something you could explain?

moon-chilled commented 1 year ago

There were three questions:

  1. in my second example, when g is reassigned, you would like the rank of b to change with it; yes?

  2. Would you also like the rank of a to change? [also in the second example]

  3. Would you like for c to be re-applied in the first example?

The first example is the one which includes c, a user-defined conjunction with side effects. It is question #3 which is too horrible to countenance an answer of 'yes' to, because users do not like it when side effects are unexpectedly performed ('spooky action at a distance').

moon-chilled commented 1 year ago

What term would you describe this behavior which is less strange than "caching"?

I say that verb rank is bound early (for consistency with a number of primitive conjunctions which are closed over the ranks of their operands).

rdm commented 1 year ago

There were three questions:

  1. in my second example, when g is reassigned, you would like the rank of b to change with it; yes?
  2. Would you also like the rank of a to change? [also in the second example]
  3. Would you like for c to be re-applied in the first example?

The first example is the one which includes c, a user-defined conjunction with side effects. It is question #3 which is too horrible to countenance an answer of 'yes' to, because users do not like it when side effects are unexpectedly performed ('spooky action at a distance').

Ok, now I think you are talking about

c=: {{
 echo 'hi'
 if. {.u b.0 do. u@:v else. u&:v end. }}
g=: -
a=: f c g
f=: +

(Personally, I thought of this as either the third example in the thread -- since i had provided two different examples, or the first example that you had provided. But I now think it's clear that you had thought of as both of my examples as being a single example. That said... I could still be incorrect in some way. Please let me know if I have still managed to incorrectly identify your concept of the second example.)

But here, no, I see no reason for a to change if c changes.

Here, the definition of a is f&:g and c is not a part of that definition. (We know that c was used in constructing that definition, but c is a conjunction which is necessarily used with "early binding".)

moon-chilled commented 1 year ago

What you reproduced is what I meant to refer to as my first example.

Notionally, a primitive conjunction such as @ is no different from c. @ itself is not 'part' of f@g; it is applied, and then returns a new verb. So, again: if we have a=. f@g, and we redefine g to have a different rank, should we expect the rank of a to change?

moon-chilled commented 1 year ago

here's another, more concrete problem: how do you deal with cycles?

a=. -
b=. a
a=. b
rdm commented 1 year ago

What you reproduced is what I meant to refer to as my first example.

Notionally, a primitive conjunction such as @ is no different from c. @ itself is not 'part' of f@g; it is applied, and then returns a new verb. So, again: if we have a=. f@g, and we redefine g to have a different rank, should we expect the rank of a to change?

J provides two approaches for explicit conjunctions. If the user wanted to have c remain a part of the definition, the definition of c would reference y (and, optionally: x).

here's another, more concrete problem: how do you deal with cycles?

a=. -
b=. a
a=. b

There's several possible approaches here.

a=. - NB. a gets rank 0 b=. a NB. b gets rank 0 a=. b NB. a gets rank 0 -- no change, no need to invalidate b


a=. - NB. a marked stale, rank will be resolved at runtime b=. a NB b marked stale, rank will be resolved at runtime a=. b NB. a marked stale, rank will be resolved at runtime, since a is already stale no need for further action


a=. - NB. a's rank set to 0 since it has no name dependencies b=. a NB. b's rank marked stale, will be resolved at runtime a=. b NB. a marked stale and references marked stale NB. since b has already been marked stale, the marking process does not progress past b


a=. - NB. a's rank set to zero, since a has no named dependencies b=. a NB. b's rank set to zero, since a is not stale a=. b NB. a's rank does not change, no need to mark it stale


From a user's point of view, these approaches are equivalent. I think I'd favor the first approach for this example.

That said, if a and b both are marked stale another problem arises, which is how to proceed when generating the stack error which results when a or b get used. I don't think we have to be too worried about performance in that error case. We want a stack error.

Still... when resolving stale definitions, I think we would want to defer the stack error to the execution side of things. So I guess we'd want a "being resolved" state on a stale definition. And, when encountering that during definition resolution, I think we'd treat the rank of a verb in that state as infinite.


From an implementor's point of view, in the context of threading: since stale definitions would be rare, the algorithm which manipulates them should be protected by a global lock (at least until that's been proven to be problematic in production code).

(There's also a few other minor bits of complexity which would need to be dealt with, but I think those will be straightforward to implement.)

moon-chilled commented 1 year ago

If the user wanted to have c remain a part of the definition, the definition of c would reference y (and, optionally: x).

Yes, but this is obviously incompatible with explicitly setting rank (as is necessary for something like @).

rdm commented 1 year ago

If the user wanted to have c remain a part of the definition, the definition of c would reference y (and, optionally: x).

Yes, but this is obviously incompatible with explicitly setting rank (as is necessary for something like @).

If you are saying that the interpreter uses functional details which are not completely exposed to the programmer using the interpreter, that would be correct (and: unavoidable).

If you are saying that the interpreter should make those functional details available to the programmer using the interpreter, I don't think that that necessarily follows. (But, hypothetically, explicit definitions could be extended to allow conjunctions which are applied at the rank of v. I think, if this were done, it could only be done for conjunctions which reference y, but I might be wrong about that.)

If you are saying something entirely different, I think I do not understand what you are saying (perhaps you could elaborate, if this is the case?).