rmnldwg / lymph

Python package for statistical modelling of lymphatic metastatic spread in head & neck cancer.
https://lymph-model.readthedocs.io
MIT License
5 stars 4 forks source link

Wrong product in `comp_posterior_joint_state_dist` function. #61

Closed rmnldwg closed 6 months ago

rmnldwg commented 6 months ago

I found a second bug in the bilateral code in the comp_posterior_joint_state_dist function. In line 605 we multiply by the ipsilateral diagnose vector. Which is correct, but the transposition of the matrix does not have the desired effect. To reach the desired effect we need to build a different vector array.

Current state:

        joint_state_dist = self.comp_joint_state_dist(t_stage=t_stage, mode=mode)
        # matrix with P(Zi=zi,Zc=zc|Xi,Xc) * P(Xi,Xc) for all states Xi,Xc.
        joint_diagnose_and_state = (
            diagnose_given_state["ipsi"].T
            * joint_state_dist
            * diagnose_given_state["contra"]
        )

Potential fix:

joint_state_dist = self.comp_joint_state_dist(t_stage='early', mode='HMM')
# matrix with P(Zi=zi,Zc=zc|Xi,Xc) * P(Xi,Xc) for all states Xi,Xc.
joint_diagnose_and_state = (
    diagnose_given_state["ipsi"][:, np.newaxis] 
    * joint_state_dist
    * diagnose_given_state["contra"]
)

Originally posted by @YoelPH in https://github.com/rmnldwg/lymph/issues/60#issuecomment-1849863685

rmnldwg commented 6 months ago

@YoelPH, nice catch! I would not have noticed that on my own. Including your solution, which of the three fixes looks more readable?

# 1. Your solution
joint_diagnose_and_state = (
    diagnose_given_state["ipsi"][:, np.newaxis] 
    * joint_state_dist
    * diagnose_given_state["contra"]
)

# 2. Using the outer product
joint_diagnose_and_state = (
    np.outer(
        diagnose_given_state["ipsi"],
        diagnose_given_state["contra"],
    ) * joint_state_dist
)

# 3. Using einsum
joint_diagnose_and_state = np.einsum(
    "i,ij,j->ij",
    diagnose_given_state["ipsi"],
    joint_state_dist,
    diagnose_given_state["contra"],
)

They all produce the same result. What do you think?

YoelPH commented 6 months ago

Personally I prefer the second option. My solution is not very readable and the Einsum always gives me a headache bcs I need to readup how it works, hahaha.