sixty-north / segpy

A Python package for reading and writing SEG Y files.
Other
99 stars 54 forks source link

`header()` strategy produces unusable binary reel headers #64

Open abingham opened 6 years ago

abingham commented 6 years ago

It seems that the test.strategies.header() strategy can produce BinaryReelHeaders with e.g. negative values in num_samples. This means the machinery is working correctly, but it also seems physically impossible. I imagine other fields in this and other headers can't hold arbitrary values from the entire range that their data type supports.

In trying to test the reader functionality, I was running into cases where header() was generating invalid headers, and I think this is the cause. I'm still wrapping my head around all of the parts, but it seems header is the (or a) culprit.

Is there a way to constrain field values? Am I using this function as expected? I was calling it like this:

binary_header = draw(header(BinaryReelHeader,
                            data_sample_format=data_sample_format))
rob-smallshire commented 6 years ago

The right thing to do here is probably to enrich the metadata in BinaryReelHeader so that it knows that these values must be positive. I'm a bit worried about what this would mean for reading badly composed SEG-Y files which are in the wild. Should we be that strict?

abingham commented 6 years ago

That's an interesting point. Strict in what we produce, lenient in what we read. The metadata about valid ranges would be useful in some cases, but we'd want to ignore it in others. I guess there's also the bigger question of what to do at all when we encounter things like negative sample counts.

On Fri, Dec 22, 2017 at 2:43 PM Robert Smallshire notifications@github.com wrote:

The right thing to do here is probably to enrich the metadata in BinaryReelHeader so that it knows that these values must be positive. I'm a bit worried about what this would mean for reading badly composed SEG-Y files which are in the wild. Should we be that strict?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sixty-north/segpy/issues/64#issuecomment-353600783, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE1eA1wWzZ475n8r6fLh8fnc_aIIs6pks5tC7HjgaJpZM4RLC3H .

rob-smallshire commented 6 years ago

I'm inclined to say that out of the box Segpy should read properly formed SEG-Y files and nothing more. Segpy is also a kit of parts to construct your own readers for odd cases, and this would extend to providing your own header definitions.

A relatively easy thing to do then would be to add a UInt16 and UInt32 types to field_types.py and then use these instead in the header definitions where we're sure the values have to be conceptually positive. We'll have to read the spec closely to check for invalid values signalled by, say, -1. (Most of the spec is pasted into BinaryReelHeader as docstrings anyway.

If you go this route remember only to truncate the lower end of the range. The signed value must still fit into a UInt16 or UInt32.

rob-smallshire commented 6 years ago

We could improve the situation at lot by introducing several enums (specifically IntEnums) to better represent the data. For example the vibratory_polarity_code in the binary reel header should be an enum.

abingham commented 6 years ago

For example the vibratory_polarity_code in the binary reel header should be an enum.

In this specific example (and perhaps others) there seems to be a mismatch between a) the documented enum values and b) the default value we specify. The allowed values are 1-8, but we specify a default of 0. Is this a mistake, or is there somewhere in the docs that says a zero-value is always allowed?

rob-smallshire commented 6 years ago

The SEG-Y specification (like all specifications) is sadly lacking. I think we have to assume in this case, and others, that zero is a none-of-the-above null value in this case, because vibratory polarity code is implicitly optional. Some header fields are mandatory or strongly recommended, others are therefore implicitly optional. Actually it's more complicated than that as some fields are "mandatory for prestack data" and some are "mandatory for all data types". Unless you know what the data actually represents – and there's no way for the software to know that – it's impossible to say exactly which set of fields are mandatory.

So, to take the example of "Measurement System" in bytes 3255-3256 (1-based) which is 1=Meters, 2=Feet, we have to assume 0=Unknown/Don't Care.

abingham commented 6 years ago

Ok, I'll implicitly include 0 in the allowed values for enum-based fields. If need to make that optional (i.e. if it turns out that some enum-fields should not allow zero), we can add a flag to the field class or something.

pareantoine commented 6 years ago

I'm not 100% sure this is the right place to comment. But since you've modified the way you read the headers, I can only read surface seismic SEG-Y files. Any other SEG-Y files give me the following error:

segy.trace_header(0) Traceback (most recent call last):

File "", line 1, in segy.trace_header(0)

File "C:...\segpy\reader.py", line 505, in trace_header trace_header = read_trace_header(self._fh, header_packer, pos)

File "C:...\segpy\toolkit.py", line 479, in read_trace_header raise EOFError("Trace header trunctated.") from e

EOFError: Trace header trunctated.

It seems that my files does not contain enough data in the header part. I've tried several non surface seismic files created by various software or people, all seem to give me the same issue.

abingham commented 6 years ago

Hmmm...these header changes might very well be causing the problem you're seeing. Do you know the commit at which your problems started? And do you have a small example file that shows this issue?

rob-smallshire commented 6 years ago

@pareantoine Did you post the full error message. The EOFError you are seeing should be caused by a ValueError, which in turn should be caused by struct.error. In the error you should see the text "The above exception was the direct cause of the following exception:" which indicates that the exceptions are chained. Can you post the whole thing?

pareantoine commented 6 years ago

Yes that was the full error message unfortunately.

I went back to the commit on the 13th of Oct 2017 and the problem is gone. I haven't tried yet to step into each commit since to see when it would start to fail.

I'll try to send you a file so you can play with it.