spectralpython / spectral

Python module for hyperspectral image processing
MIT License
571 stars 139 forks source link

Add continuum algorithms #106

Closed kormang closed 4 years ago

kormang commented 4 years ago

Thought it might be good idea to add continuum computing and removing algorithms to make this library even more awesome and complete.

kormang commented 4 years ago

Hi!

This is implementation I did for learning purposes, and then decided to contribute here, if you're iterested. Algorithm is my own, so I'm not aware if there is faster one, or better one in any way. I know it is better than what is ruffly described in: Jiwei Bai, et al., "Classification methods of the hyperspectralimage based on the continuum-removed," Proc. SPIE 4897, Multispectral andHyperspectral Remote Sensing Instruments and Applications, (16 June 2003); doi: 10.1117/12.466729

To remove continuum from full cuprite image, on my high end computer, takes about 70 seconds. Equivalent implementation in C, takes under 1 second. I've made as much optimization as possible. But unfortunately there is no quick way to accompilsh what is done in line 31 of continuum.py, which is bottleneck. To avoid computing all dot products requires manual loop, but that is even slower. However, using numba, I was able to reduce time to less than 2 seconds. How ever, such optimization is probably not acceptable for this library. On the other hand, maybe, more speed is not so necessary.

I saw there is library called hsdar in R language, that has its own implementation, maybe I could try to compare them, if you'd like.

Worst case scenario, it's O(N^2), where N is number of bands, but that is almost impossible. Best case is O(N). Average is between O(N) and O(NlogN) depending on the spectrum.

Other then that, I tried to squeeze as much performance as I could get. Sorry, I mostly come from C++ and JavaScript background :) Only recently started using Python seriously. But I'm willing to imporove, as well as to improve this pull request.

I will add doc strings to this pull request. Just waiting for initial comments.

Also, I have plans to add support for, so called, segmented upper hull. But that would be another pull request.

Thank you.

tboggs commented 4 years ago

Hi,

Thank you for the PR! I'll try to take a closer look at this in the next few days but from a code perspective, it looks very well done and it would be great to have a continuum removal function in the package. Please do continue with your work on the PR. As for speed, a slow implementation is better than no implementation. We can try to figure out how to speed it up, if not through native python code, then perhaps via cython.

Could you possibly provide an example plot or two of a spectrum, along with its continuum and with continuum removed? That would be really helpful in understanding how it is working.

Thanks Thomas

kormang commented 4 years ago

Here are some examples. On the left, there are radiance or reflactance plots, with their continuums, on the right corresponding continuum removed plots.

2020-06-03_10-08-18 2020-06-03_10-28-10

kormang commented 4 years ago

Adding segmented upper hulls could be good next move, to avoid long lines like on first few plots.

Also there is function continuum_points that returns points that belong to convex hull, and user can interpolate them as he likes.

kormang commented 4 years ago

I've made some changes and added doc strings. I will no longer consider it a draft, but of course, will adopt to any suggestion you might have.

tboggs commented 4 years ago

Thanks. Do you have an approach for how you would handle a segmented upper hull? I think that would greatly improve the utility of the continuum removal for evaluating absorption bands.

kormang commented 4 years ago

I took a look at hsdar implementation in R. I know a little bit of R, but main part is written in Fortan (for performance probably). I don't think it is to complicated to understand it. I already have some general idea of how it works. I couldn't find any mathematical definition of it. I suppose it is quasi-convexity + every point must be local maximum. So I'll try to use some of these mathematical properties to make it more optimized. I'm thinking about it right now. :)

kormang commented 4 years ago

And, yes, since it is quasi convexity, all points of convex hull should belong to segmented hull too. So I would use current algorithm, and then break detected concave part even futher if possible.

kormang commented 4 years ago

I've analyzed hsdar code. It seems, like it's not grounded in math, but in a good empirical results, which is also nice, and kind of makes sense in this case. First they find all local maxima. Then do some filtering. That is basically getting rid of those that don't satisfy quasi-convexity conditions. Then, since being maximum, is not necessary, nor sufficient, condition to belong to convex hull, they do some adjustments, to make sure everything is below the hull, with filtered maxima being predetermined points of it. This is where I lost them, right in the middle of if-inside-loop-inside-if-inside-loop-inside-loop. But that is basically how it works. And these are the resulting plots (hsdar package in R language): plot0 plot1 plot2 plot3

As I said, I did it by first finding concave region, then, breaking it further. First I find all local maxima in the region. Then filter those that don't satisfy quasi-convexity conditions. Then run existing convex hull algorithm on each of subregions separated by those filtered maxima, and also include those maxima in the result.

image

As you can see, results are the same. You can take a closer look.

image

kormang commented 4 years ago

I will add it to this pull request, if you like it. Just need to update documentation, write tests, and do some performance evaluation.

tboggs commented 4 years ago

Yes, please do add this. I assume it significantly to processing time but I believe the increase in quality is worth it.

kormang commented 4 years ago

I'm done. Ready for comments (or merge).

kormang commented 4 years ago

I did profiling, there is nothing in the new function that creates bottleneck. That is what I wanted to check. All contributions to processing time is from creating convex hull between these new maxima.

tboggs commented 4 years ago

I'll go ahead an merge shortly. Can you provide an example of how one would typically use the functions? That will be helpful for when I update the website documentation. Also, if one wanted to get the continuum, as well as the continuum-removed spectrum, would there be any performance benefit to doing both operations in a single function?

kormang commented 4 years ago

I made single function for convenience. Since continuum removal is just division by continuum, one can compute continuum, and then divide by it. That also requires extra memory if one does not need continuum alone. So I tought it might be good trade off to make it the way I did.

As for the examples, I'll take a look now at other examples on the site and come up with something in line with others examples.

kormang commented 4 years ago

Bai et al (2003) used continuum removed on the whole image, and then classified with SAM and MLC. I've modified your example with SAM. But got slighly worst results. But here is the code.

data = img.load()
sorted_band_centers = np.sort(bands.centers)
data_cr = remove_continuum(data, sorted_band_centers, mode='convex')
means_cr = remove_continuum(means, sorted_band_centers, mode='convex')
angles = spectral_angles(data_cr, means_cr)
clmap = np.argmin(angles, 2)
clmap_corrected = clmap + 1
v = imshow(classes=(clmap_corrected * (gt != 0)))

errors = (clmap_corrected * (gt != 0)) * (clmap_corrected != gt)
print('Missclassified {}'.format(np.sum(errors != 0)))
v = imshow(classes=(errors))

I should have written in the comments that bands must be sorted, otherwise, unexpected results could occur. Here, the good thing is that we support both 3d images, as well as matrices of form CxB.

Here, is example of missclassification, which, I don't event understand why is this missclassified. (All examples of misslcassifiction are similar) 2020-06-06_00-53-23 изображение Blue is pixel from image, orange is ground truth, and green is what it was classified as.

I failed using Gaussian Maximum Likelyhood Classification, I was getting LinAlgError("Singular matrix")

Another good use is visually comparing to reflectances from spectral library. Actually that is why I decided to implement continuum removal.

Here comparing single pixel from image to pixel from library people find usefull when working in tools like ENVI. It is also good example of why we support one dimensional arrays as well. I was just getting into how to use spectral libraries in Spectral Python. I'm interested in it, and as soon as I get chance I'll come back with an example.