scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
850 stars 89 forks source link

ArrayBuilder in Numba is missing some methods, such as 'string' #1438

Closed jpivarski closed 2 years ago

jpivarski commented 2 years ago

I'm astonished: it's weird that I would just forget that. But it does explain why append isn't overloaded to cover it. Probably the feature was added later and the ArrayBuilder-in-Numba implementation is just behind the times?

>>> import numba as nb
>>> import awkward as ak
>>> @nb.njit
... def add_a_string(builder, string):
...     builder.string(string)
...     return builder
... 
>>> builder = add_a_string(ak.ArrayBuilder(), "hello")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'string' of type ak.ArrayBuilderType(None)

File "<stdin>", line 3:
<source missing, REPL/exec in use?>

During: typing of get attribute at <stdin> (3)

File "<stdin>", line 3:
<source missing, REPL/exec in use?>

Indeed: it's missing. Also, I don't see it in either the v1 or the v2 implementation.

This should be a separate issue: ArrayBuilder-in-Numba is missing string and bytestring, and possibly other new methods (complex? datetime?). Fortunately, we know what the interface is supposed to be: it's supposed to be the same as outside of Numba, the ArrayBuilder Python interface.

(BTW: ArrayBuilder-in-Numba's lack of context managers list, record, and tuple is known because it's waiting on context managers as a feature from Numba, but it sounds like this will be implemented soon, and Awkward Array will be a first use-case.)

Originally posted by @jpivarski in https://github.com/scikit-hep/awkward-1.0/issues/1420#issuecomment-1105476531

ianna commented 2 years ago

@jpivarski - it looks like it's implemented in v2:

>>> import numba as nb
>>> import awkward._v2 as ak
>>> import awkward._v2._connect.numba.arrayview
>>> import awkward._v2._connect.numba.builder
>>> def add_a_string(builder, string):
...     builder.string(string)
...     return builder
... 
>>> builder = add_a_string(ak.ArrayBuilder(), "hello")
>>> builder.snapshot()
<Array ['hello'] type='1 * string'>
jpivarski commented 2 years ago

Oh, that's nice! I must have done that while porting from v1 to v2 and forgot to close this issue. I guess this wasn't needed. Are all of v2 ArrayBuilder's Python methods available in Numba? (Excluding the ones that require a context manager, since that hasn't been implemented in Numba yet.) If so, then you can close this issue.

The other starter issue, #1420, has not been implemented in v2. Looking at it now, it gets a little more into the internals of Numba, since you have to distinguish between Numba's string type and Numba's literal string type, but that's still not a very deep rabbithole.

jpivarski commented 2 years ago

Done in #1677.