scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

AttributeError when trying to read a particular format of awkward array #250

Closed HenryDayHall closed 4 years ago

HenryDayHall commented 4 years ago

Reading and loading a particular shape of awkward array, that has been created by slicing a larger array, gives; AttributeError: 'bytes' object has no attribute 'ctypes'

Here is an example that recreates the problem;

import os
import awkward

def test_split_unfinished():
    # clean any existing mess
    save_name = "test.awkd"
    try:
        os.remove(save_name)
    except FileNotFoundError:
        pass
    idxs = slice(1, None)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[], 0])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # works ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[0], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)
    # fails ~~~~~~~~~~~~~~~~
    content = awkward.fromiter([[[]], []])
    awkward.save(save_name, content[idxs])
    found = awkward.load(save_name)
    print(found)
    os.remove(save_name)

I think this must be a bug, because I don't see anything wrong with what is being attempted?

jpivarski commented 4 years ago

This would be a bug:

AttributeError: 'bytes' object has no attribute 'ctypes'

In fact, it sounds like mistaking a bytestring (bytes object) for a NumPy array (which has a ctypes attribute, from which we can get a pointer to the underlying data). If I knew where that mistake was being made, I could wrap the bytestring with np.frombuffer to view it as a NumPy array.

However, I can't find where this is happening because when I run the same commands, I don't get any error. Try this again in the latest version; you might be seeing an old bug that has since been fixed. If it's still happening, give me the exact commands (for me to try to reproduce again) and the full stack trace (which can help me find the error even if I can't reproduce it).

Note that Awkward 0 is gradually being depreciated in favor of Awkward 1, so you might not want to do new work in Awkward 0. However, Awkward 1 doesn't have file-saving yet, which is what you want here. That's a good example of why it's not an immediate transition.

HenryDayHall commented 4 years ago

Ooh that's interesting. I think I am using the latest version of awkward (0.12.21 right?), my python version is not the latest however, it is 3.6.9, not sure if that matters? I will try to reproduce the behavior in a docker. In the mean time, here is the output I get;

[[]]
[0]
[[]]
Traceback (most recent call last):
  File "example.py", line 33, in <module>
    found = awkward.load(save_name)
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 700, in load
    out = f[""]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 722, in __getitem__
    return deserialize(self._file, name=where + self.schemasuffix, awkwardlib=self.options["awkwardlib"], whitelist=self.options["whitelist"], cache=self.options["cache"])
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 575, in deserialize
    return unfill(schema["schema"])
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 517, in unfill
    args = [unfill(x) for x in schema.get("args", [])]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 517, in <listcomp>
    args = [unfill(x) for x in schema.get("args", [])]
  File "/usr/local/lib/python3.6/dist-packages/awkward/persist.py", line 527, in unfill
    out = gen(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/awkward/array/jagged.py", line 105, in __init__
    if self.offsetsaliased(starts, stops):
  File "/usr/local/lib/python3.6/dist-packages/awkward/array/jagged.py", line 29, in offsetsaliased
    starts.ctypes.data == starts.base.ctypes.data and
AttributeError: 'bytes' object has no attribute 'ctypes'
jpivarski commented 4 years ago

From this stack trace, here's the bit that's supposed to identify bytes (anywhere) used as an array and convert it into an array.

https://github.com/scikit-hep/awkward-array/blob/d88527c69d3070aa49db2aa9e14d9f02adb73e19/awkward/array/base.py#L380-L394

So, that's weird.

HenryDayHall commented 4 years ago

Here is a docker that can reproduce the issue; Dockerfile.zip I'm sure you are rather better with these than I am, but on the off chance you haven't used it much there are instructions in the comments at the top of the file.

jpivarski commented 4 years ago

Thanks! I don't know what I must have been doing differently, but your explicit file revealed the error. Not only do the starts and stops have to go through _util_toarray (above), but their starts.base and stops.base do as well. It should be fixed in PR #251.