spectralpython / spectral

Python module for hyperspectral image processing
MIT License
573 stars 139 forks source link

case sensitivity of keys in ENVI header files #30

Closed donm closed 9 years ago

donm commented 9 years ago

Although it appears to be undocumented, ENVI treats keys in header files as case insensitive. Some data creators are assuming the behavior and their .hdr files look like this

ENVI 
SAMPLES = 160
LINES = 640
...

instead of this

ENVI 
samples = 160
lines = 640
...

This causes problems for mandatory keywords, checked in spectral/io/envi.py around line 737

    # Verify minimal set of parameters have been provided
    if 'lines' not in metadata:
        raise Exception('Number of image rows is not defined.')
    ...

What do you think is the best way of dealing with this? Some possibilities: 1) Apply .lower() to all keywords in ENVI headers. (easy: envi.py line 115) 2) Apply .lower() only to mandatory keywords in ENVI headers (lines, samples, bands, etc.) (personally not a fan of this option). 3) Optional case_insensitive keyword argument. Probably has to be added to read_envi_header(), open() and gen_params() in envi.py, and possibly to open_image() in spectral.py.

Or maybe something better that I'm not thinking of. Let me know what you think and if you'd like me to submit a pull request.

tboggs commented 9 years ago

I think the first case is probably best. But if the format is really case-insensitive (the spec doesn't indicate whether it is or not), it would be best if the spectral module preserved the case of parameter names on a round trip read/write cycle. That is, if you open/read a file, then modify the data and save it to a new file name, it would be best if the cases of the parameter names were consistent with the original file.

Give me a couple days to look through the code and see how much effort it would be to preserve cases. If it's too much work for now, we can at least convert to lower case when reading and check for both lower and upper case names when saving files.

donm commented 9 years ago

I did a little more digging. ENVI had that API change a couple of years ago, but in the old API the keys passed to the routines to get/set header values were documented to be case insensitive:

http://www.exelisvis.co.jp/docs/envi_get_header_value.html

I don't see this explicitly documented anywhere for the new API, but they call the keys in the new metadata object 'tags', and tags in IDL structures are also case insensitive. So I imagine the behavior didn't change in the new API.

tboggs commented 9 years ago

I looked into this a bit (with ENVI 5.0.3). It does indeed appear that ENVI ignores the cases of parameter names (they all are converted to lower case). Furthermore, ENVI allows multiple definitions of the same parameter (and only retains the last one found). So when reading a header

ENVI 
...
SAMPLES = 160
LINES = 640
wavelength = {100.0, 200.0, 300.0, 400.0}

parses the same as

ENVI 
...
SAMPLES = 160
LINES = 640
WAVELENGTH = {100.0, 200.0, 300.0, 400.0}

and also the same as

ENVI 
...
SAMPLES = 160
LINES = 640
wavelength = {500.0, 600.0, 700.0, 800.0}
WAVELENGTH = {100.0, 200.0, 300.0, 400.0}

I also found that if data from the file is exported from ENVI or I perform a Save-As of the file, the output HDR file uses all lower case parameter names, regardless of capitalization of the input header parameters. And if duplicate definitions of parameters appear in the input header file, only the last definition is written to the output file.

Based on this behavior, I think converting parameter names to lower case (as in your option 1) makes sense.

But do you further think there should also be the ability to save output files using capitalized parameter names? If we rule out the option of generating mixed case headers (some parameters upper and some lower), this probably wouldn't be too hard (e.g., add an "upper_case_header" keyword to the save/create functions, then modify write_envi_header and _write_header_param appropriately). But there may be some odd cases where a user specifies parameters in one capitalization and but defaults or parameters read from a file are in another. So maybe the best way to handle that is to explicitly convert keys of all user-supplied metadata dictionaries to lower case, then only change them to upper case in _write_header_param, if needed.

donm commented 9 years ago

I think the "upper_case_header" keyword might be a good time to apply YAGNI, since neither of us have tried to open headers like this before in SPy, and because ENVI itself doesn't support it.

If the option is added, the keyword probably needs to be added in a few more places: SpectralLibrary.save, save_image, save_classification, and maybe I missed some others.

tboggs commented 9 years ago

I agree. It's probably a feature that is so unlikely to be used that it isn't worth polluting a bunch of functions with the necessary keywords. And if someone does absolutely need the file to be saved/created with upper case keywords, they can just write a function to modify the header file after it is created (it could probably even be done with something like sed).

So let's go with option 1.