Possible for outlier images to be all nan.

wwarriner commented 4 years ago

This appears to happen when the input hypercube is very close to all-zeros. Based on the print statements (obj = nan and err = nan) it looks like things "blow up" during the optimization process. Perhaps there should be a check to determine if the optimization has reached this state and stop early?

wwarriner commented 4 years ago

It looks like this can be caused by constant-valued input (or at least all zeros). I am not certain this is the only cause, but it is a cause.

wwarriner commented 4 years ago

Thinking about this just a little bit more, I realize that decomposition of a constant-valued tensor is probably ill-defined if the rank is greater than 1 anyway. Or to be more specific, we could return anything so long as it adds up to the constant value. But no one answer is any more meaningful than another answer. It's like asking for the decomposition of the number "2" in terms of two positive real numbers. Any answer in (x, 2-x) satisfies when x > 0. It might make sense to check for constant-valued (min() == max()) and either throw an exception or otherwise fail. Or to provide some sensible predefined answer, like having all of the rank matrices equal (though this prescribes a "best" answer for the user, whether or not they agree).

Kuo-TingKai commented 1 year ago

@wwarriner Did you solve this problem?

wwarriner commented 1 year ago

I'm mentally distant from this problem now, so I'm not sure my answer will be of use to you, but it may be worth reading. I'll point out that I did not work on the original software in this repository. The original work was grant-funded, as I understand, so it may never receive any updates. I forked it as part of my work on a project intending to use this code, and possibly resolved this issue for a specific use case.

Our general use case for the software was identification of fluorophore families in hyperspectral fluorescence imaging of human retinal tissue. The factorization assists in identifying which fluorophore families are present and where they are most abundant using tensor factorization.

The software was originally developed in the context of en-face tissue imaging, i.e., looking at the tissue in the same way a clinical ophthalmologist would when looking in through the pupil. Fluorophores in retinal tissue are most abundant in thin layers that span the breadth of the retina, so en-face images are rich in signal at every pixel.

In the project I worked on, we attempted to apply the software to images of transverse sections of human retinas. Because the fluorophores are concentrated into thin layers, much of the tensor to be factorized had zero or very low intensity, which I ultimately determined was the reason we saw this issue.

My fork attempts to address the issue by allowing input of region-of-interest (ROI) data for each hypercube. The algorithm does not care about spatial relationships, so the ROI is used to extract a subset of the image. The subset is factorized in the usual way and the factorized data (and outliers) are put back in their original image positions so a human can make sense of the results. From what I recall I tested the code on some sample tissue hypercubes and it seemed to work reasonably well, but not as well as for the en face images. The results tended to be noisier.

I hope what I've written is useful to you!

Kuo-TingKai commented 1 year ago

Your introduction is very helpful!

neel-dey commented 11 months ago

Hi all,

My apologies for the (...3-year long) delay -- I don't know why but I didn't get notifications for this issue and discussion and only just revisited this repo when someone else emailed me about outdated dependencies.

Yes, the intensity scale of the input data can definitely make the numerical optimization unstable in theory, so renormalization to some range might be beneficial. I didn't really see this in practice when I worked on this project though, both when applied to en face and transverse sections (this paper worked on transverse sections only). However, I can totally see imaging setup changes in later batches of data needing some input-preprocessing or similarly motivated strategies.

My apologies again and have a great weekend :)

neel-dey / robustNTF

Possible for outlier images to be all nan. #1