Closed rmnldwg closed 6 months ago
The new encoding looks nice! I just found some issues that are not considered yet (I think):
The code correctly checks the base
to assign the correct element_map. But, the base is not provided when compute_encoding
is called in unilateral
and I guess neither in bilateral
.
In the risk function (e.g. unilateral
line 926) we would need to add the base assignment for example like this:
marginalize_over_states = matrix.compute_encoding(
lnls=self.graph.lnls.keys(),
pattern=involvement,
base = len(self.graph.allowed_states)
)
return marginalize_over_states @ posterior_state_dist
But then the issue comes up that self.graph.allowed_states
is also not functional yet.
Here the issue is in line 539 of graph
. We access the value wrongly. A fix would be:
next(iter(model.lnls.values()), None)
Now the risk function is functional :)
but
The encoding does not properly consider how to encode trinary states yet.
The diagnoses for Trinary diagnose states are still provided as only True
and False
. Hence we can fix this either with:
Compute_encoding
functionThe code correctly checks the
base
to assign the correct element_map. But, the base is not provided whencompute_encoding
is called inunilateral
and I guess neither inbilateral
.
True! I'll fix that similar to what you suggested.
But then the issue comes up that
self.graph.allowed_states
is also not functional yet.
Huiuiui, Ok, I fixed that too 🙈
The diagnoses for Trinary diagnose states are still provided as only
True
andFalse
.
Yes, that's on purpose! The output of the compute_encoding()
function is the answer to "which states match the given pattern of involvement?". And that must always be binary, regardless of whether the model itself is binary or trinary. Because in the end it is used to marginalize over a set of probabilities.
Ahh, yes. Now that I think again about it, we are considering these cases with Sensitivity and Specificity later on.
No, I think you're right! There's still a mistake in the generate_data_encoding()
function: This one does not take into account that reports from pathology and clinical stuff should yield different encodings (they should, right?).
But it should be fixable without touching the actual encoding.
No, I think you're right! There's still a mistake in the
generate_data_encoding()
function: This one does not take into account that reports from pathology and clinical stuff should yield different encodings (they should, right?).But it should be fixable without touching the actual encoding.
Yeah, this was my guess. But I am not quite sure yet what needs to be changed for it to run properly. Right now we also have a dimensional problem that occures when you try to run the code... So I am thinking about what changes need to be done
Ok, let's break it down:
.T
is still missing here or there 😅) and everything should be correctly implemented.Did I miss anything? Does that sound right? 🤔
I do agree. With what you wrote. The only issue I encountered is that the encoding does not have the proper length. The one-hot encoded vector should be of length $2^{L\cdot M}$ to be multiplied with the observation matrix. Right now the resulting one-hot encoded vectors are of length $3^{L\cdot M}$ which is why I assumed there to be a problem with the encoding.
Yes, I think I have verschlimmbessert the situation at some point with a wrong fix. I found another issue that might have been the culprit of the dimension mismatch: A trinary confusion matrix could be only of shape $2 \times 2$ if it was computed at the wrong time, leading to some errors. I fixed that, too now.
Try again now: I have added a risk test for the trinary model and changed the base
argument to the matrix.compute_encoding()
function inside the comp_diagnose_encoding()
method to binary (i.e., 2
) always (because diagnoses are always binary). It runs smoothly and yields plausible results (meaning it is between 0 and 1).
I just ran some tests and it seems like it works the way it is intended :) Thanks for the fix!
The function
matrix.compute_encoding()
can currently only compute the correct data encoding for the binary model case.