sixty-north / segpy

A Python package for reading and writing SEG Y files.
Other
99 stars 54 forks source link

`_cdp_catalog` can be `None` in `SegYReader2D.cdp_numbers`. #63

Closed abingham closed 6 years ago

abingham commented 6 years ago

In SegYReader2D.cdp_numbers, it's possible for self._cdp_catalog to be None. In this case, the call to self._cdp_catalog.keys() fails with an AttributeError.

It seems to be legitimate that _cdp_catalog is None since CatalogBuilder.create documents that it can return None. By that same token, any catalog might be None, so perhaps we need to check for other places where null catalogs are being dereferenced.

abingham commented 6 years ago

It looks like the only use of catalog_traces() is in reader.py, and it doesn't make any attempt to account for None values coming out of catalog_traces(). The only place where a None check is made is when we calculate dimensionality right after calling catalog_traces(). In every other place where the catalogs are referenced, we assume that they're Mapping references.

From what I can see, we can safely replace these Nones with empty dicts. That is, when reader.py sees None come out of catalog_traces(), it could replace these with {}. It looks like that's really what reader.py expects anyway.

We might even be able to go a step further and have catalog_traces() return {} instead of None. But I'm not sure right now if catalog_traces() can return {} already and, if so, if that's semantically different from None.

abingham commented 6 years ago

As a first experiment (and to temporarily shave this yak so I can do testing), I'm going to turn Nones into empty dicts in reader.py.

rob-smallshire commented 6 years ago

If I remember correctly, you can get Nones in the case that there is no bijective mapping, for example if there are duplicate keys. This shouldn't happen with well-formed SEG-Y, but it does. The CatalogBuilder.create() method returns None if there are duplicates.

I think an "empty" SEG-Y file containing headers but zero traces could legitimately have empty dictionaries.

rob-smallshire commented 6 years ago

Looking at this again, the intent is that clients should not usually construct SegYReader2D directly, and should prefer to go via create_reader(). There's a comment to this effect in the docstring for SegYReader2D.__init__() where it says, "Usually a SegYReader is most easily constructed using the create_reader() function", although this should be more strongly worded.

The logic in _make_reader() which tries to determine the dimensionality of the SEG-Y assumes that, the the case the caller of create_reader() has left dimensionality at the default value of None, that a cdp_catalog is successfully constructed, and so cdp_catalog is not None.

Given this, I think it's entirely reasonable that SegYReader2D.__init__() raises a TypeError if its cdp_cataclog argument is None (or better, if it is not a mapping. There's possibly also a length check we could make here to ensure the various mappings have consistent lengths).

Essentially, if you can't provide a valid cdp_catalog mapping you're not in a position to use SegYReader2D anyway, so you need to fall back to using plain-old SegYReader.

Similarly, SegYReader3D.__init__() should raise a TypeError if its line_catalog argument is not a mapping. The SegYReader base class should sanity check its various arguments in SegYReader.__init__(). These checks weren't initially included because of the assumption that callers would go via create_reader(), but that's definitely one assumption too far.

It's much better if these objects establish their class invariants in their initialisers that having to accommodate testing for various non-sensical states once they've been constructed.

abingham commented 6 years ago

Should we go ahead and close this? I think so, but could use a second opinion.