Multi-scale Gaussian Normalisation

getsanjeev commented 6 years ago

The original attempt is located here (https://github.com/sunpy/sunpy/pull/1899). Please note there are is at least one issue with that code (https://github.com/sunpy/sunpy/pull/1899#discussion_r104646604).

This needs to be moved into this module and have the following done:

Base

[ ] Find the original AIA image used in the original paper. Note that the paper gets the time of the image wrong.
[ ] Test to compare to known IDL output (IDL version)
[ ] We need to find the difference between this code, the IDL code and the original paper. There are known inconsistencies that need to be tracked down and figured out. Once we know these, we can email the original author to ask.
[ ] Compared to another version in Python. Maybe both can exist within this package.

Extra

[ ] Optimized as much as possible for memory and CPU time.
[ ] 100% test coverage
[ ] Documentation and a worked example.

prateekiiest commented 6 years ago

Difference between @Cadair 's implementation and @ebuchlin's code

@ebuchlin's code result

        Total time running mgn: 8.29600000381 seconds
                 1178 function calls in 8.298 seconds

           Ordered by: standard name

           ncalls  tottime  percall  cumtime  percall filename:lineno(function)
                1    0.005    0.005    8.296    8.296 <ipython-input-7-ae2d858324e4>:6(function_timer)
                1    0.919    0.919    8.290    8.290 <ipython-input-8-ae9334ae9d95>:4(mgn)
                1    0.001    0.001    8.298    8.298 <string>:1(<module>)
               20    0.000    0.000    0.001    0.000 _methods.py:31(_sum)
               20    0.000    0.000    0.000    0.000 _ni_support.py:38(_extend_mode_to_code)
               30    0.000    0.000    0.000    0.000 _ni_support.py:55(_normalize_sequence)
               30    0.000    0.000    0.001    0.000 _ni_support.py:71(_get_output)
               20    0.000    0.000    0.000    0.000 _ni_support.py:91(_check_axis)
               20    0.000    0.000    0.003    0.000 _polybase.py:246(__init__)
               20    0.001    0.000    0.004    0.000 _polybase.py:290(__call__)
                1    0.000    0.000    0.000    0.000 common.py:75(wake)
               20    0.001    0.000    7.061    0.353 filters.py:120(correlate1d)
               20    0.001    0.000    0.008    0.000 filters.py:201(_gaussian_kernel1d)
               20    0.001    0.000    7.070    0.353 filters.py:222(gaussian_filter1d)
               10    0.001    0.000    7.072    0.707 filters.py:275(gaussian_filter)
                1    0.000    0.000    0.000    0.000 ioloop.py:928(add_callback)
                2    0.000    0.000    0.000    0.000 iostream.py:227(_is_master_process)
                2    0.000    0.000    0.000    0.000 iostream.py:240(_schedule_flush)
                2    0.000    0.000    0.001    0.000 iostream.py:308(write)
                1    0.000    0.000    0.000    0.000 nddata.py:240(data)
               90    0.000    0.000    0.000    0.000 numeric.py:414(asarray)
                1    0.000    0.000    0.006    0.006 numeric.py:86(zeros_like)
               20    0.002    0.000    0.002    0.000 polynomial.py:687(polyval)
               20    0.001    0.000    0.002    0.000 polyutils.py:124(as_series)
               20    0.000    0.000    0.000    0.000 polyutils.py:291(mapparms)
                1    0.000    0.000    0.000    0.000 stack_context.py:253(wrap)
               40    0.000    0.000    0.000    0.000 type_check.py:237(iscomplexobj)
               20    0.000    0.000    0.001    0.000 type_check.py:544(common_type)
                2    0.000    0.000    0.000    0.000 utf_8.py:15(decode)
                2    0.000    0.000    0.000    0.000 {_codecs.utf_8_decode}
               40    0.000    0.000    0.000    0.000 {any}
               31    0.000    0.000    0.000    0.000 {hasattr}
               72    0.000    0.000    0.000    0.000 {isinstance}
               60    0.000    0.000    0.000    0.000 {issubclass}
              101    0.000    0.000    0.000    0.000 {len}
               20    0.000    0.000    0.000    0.000 {max}
                1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
                1    0.293    0.293    0.293    0.293 {method 'clip' of 'numpy.ndarray' objects}
               20    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
                2    0.000    0.000    0.000    0.000 {method 'decode' of 'str' objects}
                1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
               20    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
               20    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
               20    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
                1    0.000    0.000    0.000    0.000 {method 'send' of '_socket.socket' objects}
               20    0.000    0.000    0.001    0.000 {method 'sum' of 'numpy.ndarray' objects}
                2    0.000    0.000    0.000    0.000 {method 'write' of '_io.StringIO' objects}
               20    0.000    0.000    0.000    0.000 {min}
                2    0.000    0.000    0.000    0.000 {nt.getpid}
               20    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
              150    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
                1    0.006    0.006    0.006    0.006 {numpy.core.multiarray.copyto}
                1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty_like}
               31    0.001    0.000    0.001    0.000 {numpy.core.multiarray.zeros}
               40    0.000    0.000    0.000    0.000 {range}
               20    7.059    0.353    7.059    0.353 {scipy.ndimage._nd_image.correlate1d}
                1    0.000    0.000    0.000    0.000 {thread.get_ident}
                2    0.000    0.000    0.000    0.000 {time.time}

Memory Usage

mgn_memory

@Cadair 's implementation

       1387 function calls in 8.735 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.611    1.611    8.731    8.731 <ipython-input-6-c8ea54dbd3ae>:6(multiscale_gaussian)
        1    0.004    0.004    8.735    8.735 <string>:1(<module>)
        1    0.000    0.000    0.011    0.011 _methods.py:25(_amax)
        1    0.000    0.000    0.010    0.010 _methods.py:28(_amin)
       24    0.000    0.000    0.001    0.000 _methods.py:31(_sum)
       24    0.000    0.000    0.000    0.000 _ni_support.py:38(_extend_mode_to_code)
       36    0.000    0.000    0.000    0.000 _ni_support.py:55(_normalize_sequence)
       36    0.001    0.000    0.001    0.000 _ni_support.py:71(_get_output)
       24    0.000    0.000    0.000    0.000 _ni_support.py:91(_check_axis)
       24    0.000    0.000    0.003    0.000 _polybase.py:246(__init__)
       24    0.001    0.000    0.005    0.000 _polybase.py:290(__call__)
       24    0.001    0.000    7.085    0.295 filters.py:120(correlate1d)
       24    0.002    0.000    0.010    0.000 filters.py:201(_gaussian_kernel1d)
       24    0.001    0.000    7.096    0.296 filters.py:222(gaussian_filter1d)
       12    0.001    0.000    7.098    0.592 filters.py:275(gaussian_filter)
        1    0.000    0.000    0.000    0.000 nddata.py:240(data)
        1    0.000    0.000    0.000    0.000 numeric.py:148(ones)
      108    0.000    0.000    0.001    0.000 numeric.py:414(asarray)
       24    0.002    0.000    0.003    0.000 polynomial.py:687(polyval)
       24    0.001    0.000    0.003    0.000 polyutils.py:124(as_series)
       24    0.000    0.000    0.000    0.000 polyutils.py:291(mapparms)
       48    0.000    0.000    0.001    0.000 type_check.py:237(iscomplexobj)
       24    0.000    0.000    0.001    0.000 type_check.py:544(common_type)
       48    0.000    0.000    0.000    0.000 {any}
       36    0.000    0.000    0.000    0.000 {hasattr}
       84    0.000    0.000    0.000    0.000 {isinstance}
       72    0.000    0.000    0.000    0.000 {issubclass}
      122    0.000    0.000    0.000    0.000 {len}
       24    0.000    0.000    0.000    0.000 {max}
       24    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       24    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.011    0.011 {method 'max' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.010    0.010 {method 'min' of 'numpy.ndarray' objects}
       26    0.022    0.001    0.022    0.001 {method 'reduce' of 'numpy.ufunc' objects}
       24    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
       24    0.000    0.000    0.001    0.000 {method 'sum' of 'numpy.ndarray' objects}
       24    0.000    0.000    0.000    0.000 {min}
       24    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
      180    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.copyto}
        4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
       36    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
       48    0.000    0.000    0.000    0.000 {range}
       24    7.083    0.295    7.083    0.295 {scipy.ndimage._nd_image.correlate1d}
        1    0.000    0.000    0.000    0.000 {zip}

Memory Utilization

cadair_mgn

nabobalis commented 6 years ago

First we want to make sure the output is correct before we worry about CPU and RAM usage.

prateekiiest commented 6 years ago

ok, I am checking on it

prateekiiest commented 6 years ago

Testing on AIA_171_IMAGE

Ebuchlin's code output

ebuchlin_plot

Cadair's code output

cadair_plot

nabobalis commented 6 years ago

So there is a difference, we need to figure out why, what is better and how it all compares to the original paper and the IDL code.

Cadair commented 6 years ago

There is at least the comment on the sunpy PR about one error in my implementation.

Also from that same comment thread an apparent discrepancy between the IDL and the paper.

prateekiiest commented 6 years ago

This one, I believe https://github.com/sunpy/sunpy/pull/1899#discussion_r104646604

prateekiiest commented 6 years ago

So after some tweaking in @Cadair 's code, I made something like this. (setting the weight a very small fractional value) There was some error about the conv variable as not being passed out by the arctan. So I corrected that as told by ebuchlin in the PR (sunpy)

Now, it looks something like this,

cadair_new_plot

nabobalis commented 6 years ago

I am not sure we want to go down the road of tweaking the code here or there without a good reason to do so. Ideally we want to implement the algorithm as stated in the paper.

prateekiiest commented 6 years ago

Testing on AIA_171_IMAGE

Ebuchlin's code output

ebuchlin_plot

Cadair's code output

cadair_plot

So after some tweaking in @Cadair 's code, I made something like this. (setting the weight a very small fractional value) There was some error about the conv variable as not being passed out by the arctan. So I corrected that as told by ebuchlin in the PR (sunpy)

Now, it looks something like this,

cadair_new_plot

All I did was just set the weight instead of being 1 previously.

Also as told by ebuchlin here https://github.com/sunpy/sunpy/pull/1899#discussion_r104646604 "the weights are applied out of the arctan, not in the arctan argument."

I made the following changes

earlier

        conv *= k
        conv *= weight
        np.arctan(conv, out=conv)

updated to


        conv *= k
        np.arctan(conv, out=conv)
        conv *= weight

No other additional change

nabobalis commented 6 years ago

But this comes back to the original problem that the paper algorithm and the IDL algorithm do not match.

prateekiiest commented 6 years ago

umm, you are right. Is it because of the noise not getting filtered out in the right way?

nabobalis commented 6 years ago

We don't know. We need someone to sit down with both and figure it out.

Cadair commented 6 years ago

But this comes back to the original problem that the paper algorithm and the IDL algorithm do not match.

What I would like to understand is how the IDL code differs from the paper. If we can summarise this concisely I will email Huw and ask him about it.

Given the change that @prateekiiest made in my version of the code, I would have thought that it matches the paper. Perhaps that would be a good place to start @prateekiiest sit down with the paper, and both versions of the Python code and play a very detailed game of spot the algorithmic difference.

wafels commented 6 years ago

I think comparing speed of implementation is not required at this time. Let's get the code right.

I think it would be useful to identify the AIA image used in the paper. Note that the paper gets the time of the image wrong - there were no AIA images in 2005. So I think contacting Huw to get the actual date and time of that image would be a good start. We can do visual comparisons at least.

On Thu, Mar 1, 2018 at 5:56 AM, Stuart Mumford notifications@github.com wrote:

What I would like to understand is how the IDL code differs from the paper. If we can summarise this concisely I will email Huw and ask him about it.

Given the change that @prateekiiest https://github.com/prateekiiest made in my version of the code, I would have thought that it matches the paper. Perhaps that would be a good place to start @prateekiiest https://github.com/prateekiiest sit down with the paper, and both versions of the Python code and play a very detailed game of spot the algorithmic difference.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sunpy/sunkit-image/issues/1#issuecomment-369555358, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8CF8fgpMSUoT-rv0SA3s2DrQsWx2kRks5tZ9PIgaJpZM4SEc9k .

prateekiiest commented 6 years ago

The IDL code looks somewhat confusing in the end portion where it calculates the final image

IDL Code Implementation

The IDL code first takes the mean and then applies an arctan on the overall mean in the end.

screenshot 52

What the paper says

The paper says it should be first sum over all the arctan values and then take the mean

screenshot 53

This part of @Cadair 's code goes along with the paper but not the IDL one

secondly IDL code uses wrong constant values for h , gamma as opposed to what mentioned in the paper. That's however minor one (and not a code implementation defect)

prateekiiest commented 6 years ago

I would be providing this comment link in my application

Some of the issues associated with the current implementation in ebuchlin's code are as follows:

The function does not take user input for the minimum and maximum values of the input image
Multi-processing is not supported in the current code.

wafels commented 6 years ago

I'll ask Huw what was intended.

On Sat, Mar 24, 2018 at 7:59 AM, Prateek Chanda notifications@github.com wrote:

I would be providing this comment link in my application

Some of the issues associated with the current implementation in ebuchlin's code are as follows:

The function does not take user input for the minimum and maximum values of the input image

Multi-processing is not supported in the current code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sunpy/sunkit-image/issues/1#issuecomment-375879917, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8CF_oH548w96rRno91RfYG7juzFYs8ks5thjU3gaJpZM4SEc9k .

vatch123 commented 5 years ago

I have gone through the above conversation and as well as the one on this PR. I went through both Cadair's and ebuchulin's implementation. All the codes are compared to what has been written in the paper. The differences are written below -

Cadair's implementation
- line 76 : conv-=data should be data-=conv.
- line 77 : conv**=2 has been done to calculate the sigmaw but later to obtain the result this conv/=sigmaw has been done in line 81. This is wrong. This essentially means we are dividing of (Bk_w-B)² by sigmaw instead of dividing (B-Bk_w) by sigmaw.
- In line 85 and line 86 the arctan should be done before multiplying with the weights.
- This implementation uses scipy.ndimage.gaussian_filter with mode of the filter set to default value reflect. See here.
- All weights are set to 1 which should not be the case.
Eric's implementation
- The image is normalized before being processed which has not been mentioned in the paper.
- Here the scipy.ndimage.gaussian_filter is used with mode set to nearest which I feel is more accurate.
- The weights g_i as mentioned in the paper have not been used.
- No global gamma transformed image C^'_g has been calculated, though this is clearly mentioned in the paper.
- At the end, the original image is used at the time of taking weighted sum instead of the global gamma transformed image.
- The return statement is not at all according to the paper. The weighted mean has not been taken in the way it was written in the paper.
- The values of a[0], a[1], b[0], b[1] has been assumed.
- The calculated values after applying the transform has been again brought between (0,1) from (-pi/2, pi/2). This is no where mentioned in the paper. See line 47, 48.
IDL implementation I am fairly new to IDL. I have gone through the IDL code but still there may be some more errors. The discrepancies which I found were -
- The arctan is applied after taking the weighted mean of images obtained from various gaussian kernels rather the arctan should be applied first to images obtained from individual gaussian kernels then their weighted mean should be taken.

@nabobalis These were the discrepancies which I was able to find out. How should I proceed further?

nabobalis commented 5 years ago

That is some through work. Good job.

So I guess the first step would to be correct the first (cadair's) implementation to match the paper and then see the changes made by Eric are useful and then incorporate them into the same code in some manner.

Regarding the IDl and the paper, if that is the only change, we can try to see if that change actually makes the result better. Otherwise we should email the original author and ask for their opinion on this change.

But I would hold on doing anything. This is getting closer to the GSoC project and you've done enough regarding this project before GSoC.

ebuchlin commented 5 years ago

Thanks for the comparison! Just a few comments about what I understand from what my code does:

Eric's implementation

The image is normalized before being processed which has not been mentioned in the paper.

Yes, but normalization from [a0, a1] to [0,1] here is equivalent to normalization that is done later as described in the paper in Eq. (1) (for one Gaussian kernel) and Eq. (4): in Eq. (1), with B' = (B-a0)/(a1-a0), we can substitute B by a0+ (a1-a0)*B', and we get exactly the same equation for C as a function of B' than as a function of B; this is because of the subtractions in Eqs. (1-2) (the offset a0 disappears) and the linearity of the convolution (there is the same factor (a1-a0) in the numerator and denominator of Eq. (1)). Also, Eq. (4) can be written C'g = B'^(1/γ).

The weights gi as mentioned in the paper have not been used.

Yes, I followed "For most purposes, the gi can be set equal for all scales so that a straightforward mean of the locally normalised Ci is made." Such weights can easily be added.

No global gamma transformed image C'g has been calculated, though this is clearly mentioned in the paper.

It is computed as image ** (1. / gamma) in the return statement, where image is the normalized image (or B' above).

At the end, the original image is used at the time of taking weighted sum instead of the global gamma transformed image.

No, it is the normalized image, at the power 1/γ, this is the same as the gamma-transformed original image (see above).

The return statement is not at all according to the paper. The weighted mean has not been taken in the way it was written in the paper.

Yes, there are no weights (see above). Also, I use w = 1 - h, as I wanted this weight w to represent the strength of the MGN, not the weight of the gamma-transformed image (as h).

Apart from that and output normalization to [b0, b1], I don't see a difference with the paper.

The values of a[0], a[1], b[0], b[1] has been assumed.

There are default parameters values, but the values can be changed by passing the arguments of the function.

The calculated values after applying the transform has been again brought between (0,1) from (-pi/2, pi/2). This is no where mentioned in the paper. See line 47, 48.

Yes, this is to have the C'i on the same range [0,1] as C'g, so that the value of w (or h) is meaningful, and so that the output range is [b0, b1] as expected. But this is only equivalent to taking gi = 1/π for all i.

vatch123 commented 5 years ago

@ebuchlin Thanks for making things clear. I will have to go through your code again. It feels as if I somehow missed those small details.

vatch123 commented 5 years ago

That is some through work. Good job.

So I guess the first step would to be correct the first (cadair's) implementation to match the paper and then see the changes made by Eric are useful and then incorporate them into the same code in some manner.

Regarding the IDl and the paper, if that is the only change, we can try to see if that change actually makes the result better. Otherwise we should email the original author and ask for their opinion on this change.

But I would hold on doing anything. This is getting closer to the GSoC project and you've done enough regarding this project before GSoC.

I don't mind working on this even if I am not selected for GSOC. If you insist I will look into something else.

nabobalis commented 5 years ago

I won't insist, nor can I stop you working on this. I do not want you to feel hard done by if you do carry on and we don't select this project.

vatch123 commented 5 years ago

Okay. Hoping that I would be selected for GSOC, I will pick up this issue once the results are announced. Meanwhile I will work on my proposal.

Cadair commented 5 years ago

@vatch123 That's a really nice breakdown, thanks.

sunpy / sunkit-image