spectralpython / spectral

Python module for hyperspectral image processing
MIT License
573 stars 139 forks source link

Add protection to all ENVI omitted header labels #42

Closed kidpixo closed 8 years ago

kidpixo commented 8 years ago

Following #41, add protection for all the ENVI header label with a default values, in case they are not in the Header file.

Official documentation :

I'll try to do on my fork and PR soon.

tboggs commented 8 years ago

Sounds good. Not knowing which parameters you are planning to set to defaults, here are a few things to consider:

In general, I think what you're doing is a great idea. I just want to make sure there aren't any unintended consequences.

kidpixo commented 8 years ago

Sounds reasonable, I would then put in a bare minimum set. At least all the parameter implicitly defined in spectral/io/envi.py in the function gen_params() should have a default , otherwise the reader will crash.

Therefore all the the parameter in this snippet of code should be have a fallback status, if not defined.

    p.nbands = int(h["bands"])
    p.nrows = int(h["lines"])
    p.ncols = int(h["samples"])
    p.offset = int(h["header offset"]) if "header offset" in h else int(0)
    p.byte_order = int(h["byte order"])
    p.dtype = np.dtype(envi_to_dtype[str(h["data type"])]).str

But maybe make sense to crash for some missing keyword, I put a question mark fot those parameter here above.

What to do with a label with missing "bands" keyword?

local name envi header default
nbands "bands" ?
nrows "lines" ?
ncols "samples" ?
offset "header offset" 0
byte_order "byte order" 0?
dtype "data type" ?
donm commented 8 years ago

Unfortunately, I don't think there's any way to guess intelligent defaults for any of those keywords that you listed (aside from offset, which ENVI also has a default for). If one of nbands, nrows, ncols, or dtype is missing, then you could figure out the missing value from the other three things plus the file size, but it doesn't seem worth it to add code to do that, to me. Normally they're all there or all missing, and if something is missing then it really is a just a malformed header.

But while an error seems appropriate in those cases, it would be helpful to do these things in a try-except block so that the error message is something nicer than KeyError: 'byte_order' if it's missing.

tboggs commented 8 years ago

Other than "header offset", I believe all the parameters in the table above are mandatory in the ENVI header so for us to provide defaults for those is a step beyond even what ENVI would do. More importantly, defaulting them is potentially dangerous because of potential ambiguities that would result in corrupt data. As @ohspite suggested, it may be possible to deduce a single parameter if others are provided but that would require quite a bit of logic to ensure we are making a "safe" guess.

Some parameters simply can't be deduced. There is no way to deduce "byte order" for existing files and guessing wrong results in corrupt data. If all other parameters are provided, we can guess the size of the "data type" but not the actual value (e.g., int16 and uint16 would both match a 16-bit "data type" but the wrong guess corrupts the data).

So for parameters above other than "header offset", I think raising an exception is the right thing to do. I agree that it would be preferable to have something other than the current KeyError exception. To avoid try/except blocks for every mandatory parameter, we could do something like

mandatory_params = ['bands', 'lines', 'samples', 'byte order', 'data type']
for p in mandatory_params:
    if p not in header:
        raise Exception('Mandatory parameter "{}" missing from ENVI header file.'.format(p))

or something to that effect. Perhaps the check_compatibility function would be a good place to do this.

All this said, the links provided by @kidpixo indicate a number of other parameters that have ENVI defaults that we might want to consider for completeness.

kidpixo commented 8 years ago

I agree, malformed header should gracefully stop the execution and produce an informative error, instead of guessing some potentially disruptive default.

@tboggs code seems a pretty good example.

I cannot find in the documentation which are the mandatory parameter for an ENVI label, even if it sounds logic to have some. Do you have any reference on this point?

tboggs commented 8 years ago

My guess is that the only mandatory ones are the ones in your table above (minus "header offset") plus "interleave". Unfortunately, I don't have a working ENVI installation handy; otherwise, I would create a simple file with that set of parameters and sequentially remove a single parameter to see if/when it fails.

kidpixo commented 8 years ago

Uhm that sounds like a good idea: in theory I should have a license for IDL/ENVI floating around, if I have time I can install and test your idea.

tboggs commented 8 years ago

I was able to test this with ENVI 5.0.3 and 5.2.1. For both versions, the following parameters were mandatory and an error ("File does not appear to be a valid ENVI file.") is generated if any of them are not present:

Opening the same file did succeed when any of the following parameters were omitted:

As indicated above, I'm not sure whether byte order was defaulted to 0 because 0 is the universal default or if it was because 0 is the host byte order on the system where I tested. Note that for all three of the optional parameters, ENVI did write those parameters to a new file when I did a Save-As after reading the input file.

I think we should definitely raise an exception when any of the first 5 parameters listed are missing. I also think an exception should be raised if byte order is omitted. Even though ENVI defaulted it when it was omitted, data would be corrupted if the wrong value is guessed. Also, the links provided by @kidpixo do not state that byte order has a default value so I would not be confident that it would still be defaulted in a future version of ENVI.

Boatsure commented 4 years ago

About byte order, in the direct output data of the DataField hyperspectral device, the byte order item is missing in the .hdr header file. I found some regulations on byte order in the industry from another brand's description of hyperspectral data format:

The order of bytes in integer, long integer, 64-bit integer, unsigned 64-bit integer, floating point, double precision, and complex data types. Use one of the following: a. Byte order=0 (Host (Intel) in the Header info dialog) is least signification byte first (LSF) data (DEC and MS-DOS systems). b. Byte order=1 (Network (IEEE) in the Header info Dialog) is most significant byte first (MSF) data (all other platforms).

https://prediktera.com/download/pdf/Prediktera_Evince_Breeze_hyperspectral_file_format.pdf

tboggs commented 4 years ago

@sureatgithub thanks for the reference. That is consistent with the ENVI format and how the spectral module handles it (see here).

The problem is that when "byte order" is not provided by the device generating the data file, a different system processing the the data later doesn't know what byte order is used in the file. It may be possible to develop an heuristic to guess the byte order (e.g., assume a byte order and see if the data read from the file seem reasonable) but that requires some assumptions about the data and I doubt it would work for all cases.

kidpixo commented 4 years ago

I had no idea someone was still following this issue... I have no idea anymore which data I was using!