Closed kidpixo closed 8 years ago
Sounds good. Not knowing which parameters you are planning to set to defaults, here are a few things to consider:
complex function
is only relevant if the image data type is complex. Similarly, dem band
only makes sense if there is an associated DEM file.In general, I think what you're doing is a great idea. I just want to make sure there aren't any unintended consequences.
Sounds reasonable, I would then put in a bare minimum set.
At least all the parameter implicitly defined in spectral/io/envi.py
in the function gen_params()
should have a default , otherwise the reader will crash.
Therefore all the the parameter in this snippet of code should be have a fallback status, if not defined.
p.nbands = int(h["bands"])
p.nrows = int(h["lines"])
p.ncols = int(h["samples"])
p.offset = int(h["header offset"]) if "header offset" in h else int(0)
p.byte_order = int(h["byte order"])
p.dtype = np.dtype(envi_to_dtype[str(h["data type"])]).str
But maybe make sense to crash for some missing keyword, I put a question mark fot those parameter here above.
What to do with a label with missing "bands" keyword?
local name | envi header | default |
---|---|---|
nbands | "bands" | ? |
nrows | "lines" | ? |
ncols | "samples" | ? |
offset | "header offset" | 0 |
byte_order | "byte order" | 0? |
dtype | "data type" | ? |
Unfortunately, I don't think there's any way to guess intelligent defaults for any of those keywords that you listed (aside from offset
, which ENVI also has a default for). If one of nbands
, nrows
, ncols
, or dtype
is missing, then you could figure out the missing value from the other three things plus the file size, but it doesn't seem worth it to add code to do that, to me. Normally they're all there or all missing, and if something is missing then it really is a just a malformed header.
But while an error seems appropriate in those cases, it would be helpful to do these things in a try-except
block so that the error message is something nicer than KeyError: 'byte_order'
if it's missing.
Other than "header offset", I believe all the parameters in the table above are mandatory in the ENVI header so for us to provide defaults for those is a step beyond even what ENVI would do. More importantly, defaulting them is potentially dangerous because of potential ambiguities that would result in corrupt data. As @ohspite suggested, it may be possible to deduce a single parameter if others are provided but that would require quite a bit of logic to ensure we are making a "safe" guess.
Some parameters simply can't be deduced. There is no way to deduce "byte order" for existing files and guessing wrong results in corrupt data. If all other parameters are provided, we can guess the size of the "data type" but not the actual value (e.g., int16
and uint16
would both match a 16-bit "data type" but the wrong guess corrupts the data).
So for parameters above other than "header offset", I think raising an exception is the right thing to do. I agree that it would be preferable to have something other than the current KeyError
exception. To avoid try
/except
blocks for every mandatory parameter, we could do something like
mandatory_params = ['bands', 'lines', 'samples', 'byte order', 'data type']
for p in mandatory_params:
if p not in header:
raise Exception('Mandatory parameter "{}" missing from ENVI header file.'.format(p))
or something to that effect. Perhaps the check_compatibility
function would be a good place to do this.
All this said, the links provided by @kidpixo indicate a number of other parameters that have ENVI defaults that we might want to consider for completeness.
I agree, malformed header should gracefully stop the execution and produce an informative error, instead of guessing some potentially disruptive default.
@tboggs code seems a pretty good example.
I cannot find in the documentation which are the mandatory parameter for an ENVI label, even if it sounds logic to have some. Do you have any reference on this point?
My guess is that the only mandatory ones are the ones in your table above (minus "header offset") plus "interleave". Unfortunately, I don't have a working ENVI installation handy; otherwise, I would create a simple file with that set of parameters and sequentially remove a single parameter to see if/when it fails.
Uhm that sounds like a good idea: in theory I should have a license for IDL/ENVI floating around, if I have time I can install and test your idea.
I was able to test this with ENVI 5.0.3 and 5.2.1. For both versions, the following parameters were mandatory and an error ("File does not appear to be a valid ENVI file.") is generated if any of them are not present:
lines
samples
bands
data type
interleave
Opening the same file did succeed when any of the following parameters were omitted:
header offset
(defaults to 0)file type
(defaults to "ENVI Standard")byte order
(defaults to either 0 or host order - not sure which)As indicated above, I'm not sure whether byte order
was defaulted to 0 because 0 is the universal default or if it was because 0 is the host byte order on the system where I tested. Note that for all three of the optional parameters, ENVI did write those parameters to a new file when I did a Save-As after reading the input file.
I think we should definitely raise an exception when any of the first 5 parameters listed are missing. I also think an exception should be raised if byte order
is omitted. Even though ENVI defaulted it when it was omitted, data would be corrupted if the wrong value is guessed. Also, the links provided by @kidpixo do not state that byte order
has a default value so I would not be confident that it would still be defaulted in a future version of ENVI.
About byte order, in the direct output data of the DataField hyperspectral device, the byte order item is missing in the .hdr header file. I found some regulations on byte order in the industry from another brand's description of hyperspectral data format:
The order of bytes in integer, long integer, 64-bit integer, unsigned 64-bit integer, floating point, double precision, and complex data types. Use one of the following: a. Byte order=0 (Host (Intel) in the Header info dialog) is least signification byte first (LSF) data (DEC and MS-DOS systems). b. Byte order=1 (Network (IEEE) in the Header info Dialog) is most significant byte first (MSF) data (all other platforms).
https://prediktera.com/download/pdf/Prediktera_Evince_Breeze_hyperspectral_file_format.pdf
@sureatgithub thanks for the reference. That is consistent with the ENVI format and how the spectral module handles it (see here).
The problem is that when "byte order" is not provided by the device generating the data file, a different system processing the the data later doesn't know what byte order is used in the file. It may be possible to develop an heuristic to guess the byte order (e.g., assume a byte order and see if the data read from the file seem reasonable) but that requires some assumptions about the data and I doubt it would work for all cases.
I had no idea someone was still following this issue... I have no idea anymore which data I was using!
Following #41, add protection for all the ENVI header label with a default values, in case they are not in the Header file.
Official documentation :
I'll try to do on my fork and PR soon.