Rewrite does not produce identical file

rth commented 9 years ago

I'm trying to read a SEG-Y files, write it back and read back the result in order to check the consistency with the original file. No errors are raised, and according to segpy, the data and the header contents are identical in both files (tested with this script, however the resulting binary has some small differences, as illustrated on the image below, untitled this happens periodically I think for every trace, and the string 0xf4252414b in the original file is allways replaced by zeros 0x000000000. My guess is that a field in the trace header is not being written back as it should, although I have not been able to figure out the exact origin of this problem.

Also I think segpy.header.are_equal does not check all the fields, since it relies on self.ordered_field_names, and that function seems to return only a subset of all the fields in my case (although I might be mistaken and/or missed something).

Note: the script above requires the PR #20 , that fixes an issue in segpy/writer.py.

rob-smallshire commented 9 years ago

Can you supply me with the file that is causing you problems? Perfect roundtripping of SEG Y files is definite goal of Segpy, so I'd like to address this.

rth commented 9 years ago

Thank you for looking into this! The data file I used for this, can be found here.

rob-smallshire commented 9 years ago

I have the file. Rest assured that none of the test data is made public as most of it is proprietary.

rob-smallshire commented 9 years ago

Some notes during debugging:

The diffs are regularly spaced and are five bytes long. The first four are at offsets of 0xefb, 0x13d3, 0x18ab, 0x1d83. The offsets of the first four trace headers are 0xe10, 0x12e8, 0x17c0, 0x1c98.

>>> 0xefb - 0xe10
235
>>> 0x13d3 - 0x12e8
235
>>> 0x18ab - 0x17c0
235
>>> 0x1d83 - 0x1c98
235

The zero-based offset of 235 is in the "unassigned" block of the SEG Y trace header between one-based offsets 233-240. These unassigned bytes are currently neither read nor written by Segpy.

The solution is to derive a custom trace header description class from TraceHeaderRev1 which defines these additional field(s) and then pass the new class as the trace_header_format argument of create_reader() and write_segy(). I've modified your example code to do exactly this. With this change in place your file roundtrips perfectly :-)

#!/usr/bin/env python3

from segpy.reader import create_reader
from segpy.trace_header import TraceHeaderRev1
from segpy.types import Int16
from segpy.writer import write_segy
from segpy.header import are_equal, field

class CustomTraceHeader(TraceHeaderRev1):

        unassigned_1 = field(
            Int16, offset=233, default=0, documentation="Unassigned 1")

        unassigned_2 = field(
            Int16, offset=235, default=0, documentation="Unassigned 2")

        unassigned_3 = field(
            Int16, offset=237, default=0, documentation="Unassigned 3")

        unassigned_4 = field(
            Int16, offset=239, default=0, documentation="Unassigned 4")

in_filename = "data/rth.segy"
out_filename = "data/rth_out2.segy"

in_file = open(in_filename, 'rb')

with open(out_filename, 'wb') as out_file:
    segy_reader_in = create_reader(in_file, trace_header_format=CustomTraceHeader)
    write_segy(out_file, segy_reader_in, trace_header_format=CustomTraceHeader)

out_file = open(out_filename, 'rb')
segy_reader_out = create_reader(in_file, trace_header_format=CustomTraceHeader)

for trace_index in segy_reader_in.trace_indexes():
    trace_offset = segy_reader_in._trace_offset_catalog[trace_index]
    print(trace_index, hex(trace_offset))
    head0 = segy_reader_in.trace_header(trace_index)
    head1 = segy_reader_out.trace_header(trace_index)
    assert are_equal(head0, head1), "Error {}".format(trace_index)

    data0 = segy_reader_in.trace_samples(trace_index)
    data1 = segy_reader_out.trace_samples(trace_index)
    assert data0==data1

rth commented 9 years ago

That makes sens. Thanks for the workaround solution!

sixty-north / segpy

Rewrite does not produce identical file #21