spectralpython / spectral

Python module for hyperspectral image processing
MIT License
573 stars 139 forks source link

Envi classification #10

Closed donm closed 10 years ago

donm commented 10 years ago

Hi Thomas. This patch allows ENVI class maps to be saved. I split some code in save_image() into another function because it was getting kinda long after my additions.

tboggs commented 10 years ago

This is a nice addition but I'd like to propose a slightly different implementation. Using the save_image function certainly works but I think it makes the function slightly more complex and potentially error-prone. For example, there would be some keywords that are only applicable when saving images (e.g., 'interleave'), whereas others would only be applicable to classification results (e.g., 'class_names', 'class_colors'). Also, if a user sets the 'file_type' keyword to something like 'ENVI classification' (instead of 'ENVI Classification'), the file would be saved as an image instead of classification results (which is what they really wanted).

What do you think about putting the new code into a separate envi.save_classification function? If you do that, the new keywords would only be used by the new function and the 'file_type' keyword wouldn't be required. save_classification would just call save_image with the appropriate metadata dictionary. save_image already looks for a metadata keyword named 'file type' so you could set that parameter to 'ENVI Classification' within the save_classification function, prior to calling save_image. It looks like most of this can be achieved by moving the code block under the if file_type=="ENVI Classification": line from _updated_header_parameters into the save_classification function.

One other feature of the new function that I'm curious about is the default colors used when specific colors are not given. I haven't had a chance to use the branch from the pull request yet but it appears that the generated colors will be different than the default colors used by the SPy module (the colors defined in spectral.spy_colors). It may be that the colors you are using as defaults are better than the currentspy_colors but unless they are made consistent, there will be some odd behavior. For example, if a user performs a classification and calls spectral.imshow to view the classification image, they will get one set of colors but if they call save_classification (using defaults), then later call envi.open to open and view the results, the colors will appear differently (I'm assuming here that spectral.imshow would use the colors from the classification file).

donm commented 10 years ago

You're right--a separate save_classification function would be a much better interface.

It seems like there are two ways to do it. One way would be to pass the metadata from save_classification to save_image, like you said. But in that case there needs to be some way of telling save_image not to overwrite the file type field with 'ENVI Standard'. I don't think save_image should trust the value in the metadata dict, so then there would still need to be a file_type keyword (or something like it), it just wouldn't be set by the user.

The alternative would be to break what is currently in save_image into one or two other helper functions that the new save_image and save_classification can both call. So then instead of save_classification setting up additional stuff in the metadata dict and passing it to save_image, they would both prepare the dict and data and pass it to something like _write_image().

I think the second option is better, but let me know if you have a preference or if you see another way to go.

About the colors: sorry, I didn't see that there was one place where colors were defined for the whole package. And it will be better to use the consistent colors for the reasons you mentioned. Incidentally, I ran into trouble with the finite number of class colors when I was doing something with the ND visualizer. But that's another topic for sometime later.

Let me know what you prefer for save_image and save_classification and I'll try to push something on monday.

tboggs commented 10 years ago

I think option 2 is probably the cleanest way to do it - move most of the current save_image function into a _write_image function that save_image and save_classification will call. _write_image can take a positional 'file_type' arg. With that implementation, the metadata argument can be used exclusively for parameters that are written to the file.

Regarding the colors, I thought I made all the functions that use them recycle the color palette once the list is exhausted but it's possible I didn't implement that everywhere. One reason I wasn't concerned about creating an arbitrarily large number of colors is because beyond a sufficiently large number of colors, new ones become indistinguishable from others already in the palette. That said, there was no rigorous process used to establish the current color palette.

donm commented 10 years ago

Let me know if this implementation is close to what you had in mind. Also let me know if you'd like me to resubmit this as one clean commit.

tboggs commented 10 years ago

I think this looks good and breaks the functions up cleanly.

A couple questions regarding parameters to save_classification:

Before I do the merge, please do the following:

In [6]: envi.save_classification('cls.hdr', gt)
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-6-e0d583c4f4f1> in <module>()
----> 1 envi.save_classification('cls.hdr', gt)

/home/thomas/src/spectral/spectral/io/envi.py in save_classification(hdr_file, image, **kwargs)
    452             # list was already flattened
    453             colors = list(class_colors)
--> 454     if len(colors) < n_classes * 3:
    455         colors = []
    456         for i in range(n_classes):

UnboundLocalError: local variable 'colors' referenced before assignment

If you can address the stuff above, I'll take care of the following:

Sound ok?

donm commented 10 years ago

Okay, those changes have been made. That bug was caused by an indentation error.

I made a couple of final changes in addition to adding the docstrings. The changes make it unnecessary to pass kwargs to one of the helper functions. I also renamed what used to be _updated_header_parameters() to add_image_info_to_metadata() because I saw that you already had add_band_info_to_metadata(), and the functions seemed to deserve each other.

Does the ENVI classification file format support multiple data types?

I don't know if it is intended, but it does. ENVI seems to open classification files fine with data in any integer type. Actually, I played with floats and even those seem to work, and it looks like the class is assigned based on the floor() of the pixel's value. I updated the code that guesses the number of classes so that it doesn't die if the data is floating point, but it's probably not very helpful functionality.

I also see "interleave" as a keyword argument in the doc string. Is that just a copy/paste leftover or can that actually be used for a classification file?

ENVI supports classification results with multiple bands, and the interleave does change the way the file is opened. But again, that's just current behavior rather than a promise I read in the documentation. I tested this by saving different classmaps as one file after np.dstacking them together, and then opening the result in ENVI.

tboggs commented 10 years ago

Ok, it looks good to me. I've added a few unit tests on my local copy of this branch with unit tests to verify saving/loading classes and to verify that an exception is raised when an unsupported dtype is passed to any of the ENVI save functions.

If you can rebase this branch, I'll pull the update and merge the branch back into master.

With this new feature there are probably some additional changes that make sense with regard to how class data are visualized. Things to be considered:

tboggs commented 10 years ago

Disregard my comment above about rebasing. GitHub's network graph is sometimes wonky and doesn't show the current state when it is opened but I did a page refresh and saw that you did rebase.

Thanks.