In the computation of causality violation, how are multiple spikes for the same function treated?

caiw commented 1 week ago

Imagine a case where we have two functions, A and B, with the relationship

but where there are three multiple distinct spikes for A, with the following order by latency: A1, A2, B, A3.

How do we connect the A-spikes to the B-spike in the IPPM graph?

Is it like this: (1) which contributes to causality violation, or like this: (2) or this: (3) both of which might not?

Of course there are other options: (4) (5)

And dependent on this, how should CV be calculated?

I don't think we need to discuss this in detail in the paper btw, but just want to make sure we are happy with the precise definition of CV score before we finalise the results!

caiw commented 1 week ago

In my opinion, the following factors into the choice:

If our original candidate graph is genuinely a correct description of the steps to compute B (i.e. A is upstream of B) that doesn't necessarily mean that A might not be re-expressed for some other reason later down the line.
- Think re-activation of short-term memory in the phonological loop. I feel like this shouldn't "penalise" the clustering solution.
- This is kinda equivalent to A3 actually being a new function "identity(A)", with that being what labels the A3 spike - it's just like "A and then do nothing".
- This to me suggests one of:
- 2: link from earliest node - this is when the input information for B was first available
- 3: link from immediately preceding node
- 4: link from all earlier nodes - we don't know which contributed to the input of B and which was doing something else
- and suggests against:
- 1: link from last node - what if this is part of another process we aren't searching for?
- I'm agnostic about:
- 5: this includes the fewest assumptions, and amounts to not doing any editing of the graphs - it's the pure clustering results applied to the candidate graph.
On the other hand, if B precedes all A spikes, this should be the maximum CV score - something is wrong with the assumed candidate graph, or the clustering solution.
Should the above scenario create some CV score, since we don't specifically hypothesise any A preceding any B, but not the maximum?

neukym commented 1 week ago

^ @anirudh1666

neukym commented 1 week ago

This is great. I agree with everything you say. This situation is challenging – we have no way of knowing which of these options it is. Currently (and @anirudh1666 correct me if I am wrong) we do (1), and this will correctly give a poor CV score. I think this is fine for the current paper - it is reasonable first approximation.

I have a large number of other thoughts on this, but maybe we should keep it for a conversation between the three of us otherwise I'll be writing all day. :-)

anirudh1666 commented 1 week ago

I agree with your comment.

We currently place an arrow from the final parent transform to the initial child transform. This is the most pessimistic option out of the 5. In other words, it has a low false positive rate (if you get good IPPM, very high chance it is correct) but also a high false negative rate. In choosing to optimise between these metrics, I think we should consider consequences of a false positive vs false negative. Incorrectly judging a true IPPM as false can lead to missed opportunities and papers. On the other hand, incorrectly labelling a false IPPM as true can lead to incorrect results being published. (We can talk about this in person because this is quite complicated and a bit subjective).

For the example that Cai has provided, we could use an alternative definition for CV:

We can define CV as the number of nodes of A that succeed B / number of nodes of A. So in the example, this would be 1/3.
An additional improvement would be to weight the sum by the surprisal. Just counting the number of nodes of A weights them all equally but since we have the surprisal, we can weight them by their probability. So, CV we could do the sum of surprisal for nodes of A that succeed B / sum of surprisal for all nodes of A.
With this, we can compute CV per transform, then average.
If we have multiple nodes for B, we can enumerate all the possible cases.
We can extend this to multiple nodes of B by counting the number of cases (arrows) that lead to causality violation and weighting that violation with the probability of that case occurring.

This definition complies with your axioms for CV:

If all spikes for A are after B, maximum CV.
If some spikes for A are after B, moderate CV.
If no spikes for A are after B, no CV.

I definitely agree this is something we should talk about in person given its complexity.

anirudh1666 commented 1 week ago

The current definition of CV satisfies the first and third axiom but not the second. So, this is something we can talk about in the Discussion or a note I can add to the Future Work section

kymata-atlas / kymata-core

In the computation of causality violation, how are multiple spikes for the same function treated? #361