gregversteeg / bio_corex

A flexible version of CorEx developed for bio-data challenges that handles missing data, continuous/discrete variables, multi-CPU, overlapping structure, and includes visualizations
Apache License 2.0
137 stars 30 forks source link

Implement clip of sig_ml to avoid divide by zero bug #20

Closed jpkrooney closed 3 years ago

jpkrooney commented 4 years ago

Proposed solution for issue #19 I have added the line: sig_ml = sig_ml.clip(0.5) # To prevent divide by zero in marginal_p function the estimate_parameters function in order to later divide by zero issue in the marginal_p function. As discussed in issue #19 , 0.5 is a somewhat arbitrary figure, however it seems to work well in practice. However you may wish to explore this further before implementing this fix.

jpkrooney commented 3 years ago

After some further thought and experimentation I'd suggest a clip value of 0.25 instead of 0.5. i.e. sig_ml = sig_ml.clip( 0.25 ) # To prevent divide by zero in marginal_p function

The logic being as follows: the denominator for the calculation in marginal_p is: (2. * sig) - 0.5 * np.log(2 * np.pi * sig)).transpose((1, 0, 2)) This output of this function reaches a minima at 0.25 as sig varies as shown in this graph:

Screenshot 2020-08-31 at 10 11 33

Below sig = 0.25 the function quickly becomes asymptotic. In test runs with different values for the clip, 0.25 produce higher TCS than a clip of 0.25.

jpkrooney commented 3 years ago

Hi Greg. I took another look at this issue. I'm afraid I made a mistake in generating the above graph by misplacing a bracket. Thus the choice of 0.25 as a clip on sig_ml is not justified by the above argument. Will post more thoughts on the divide by zero issue here: https://github.com/gregversteeg/bio_corex/issues/19#issue-606719022