feat: allow `np.array`s in `ak.full_like` as fill_value

pfackeldey commented 3 days ago

This PR allows to pass numpy arrays as fill value to ak.full_like. They are then broadcasted accordingly into the shape of the reference array, e.g.:

ref = ak.Array(np.ones((2, 2)))
ak.full_like(ref, fill_value=np.array([2.0, 3.0]))
# >> <Array [[2, 3], [2, 3]] type='2 * 2 * float64'>

Fixes https://github.com/scikit-hep/awkward/issues/2787.

Currently, this only works with numpy arrays. Maybe at some point this should be extended with arbitrary awkward-arrays?

pfackeldey commented 3 days ago

Ok, so just to clarify what is possible and what not with and without this PR:

import awkward as ak
import numpy as np

ref = ak.Array(np.ones((2, 2)))

ak.full_like(ref, 2)
# >> <Array [[2, 2], [2, 2]] type='2 * 2 * float64'>

ak.full_like(ref, "a")
# >> ValueError: could not convert string to float: np.str_('a')

# oh... this is possible but only if the ref has the correct dtype? 
# - a little unexpected for me, I thought there is an automatic
# type promotion to whatever the fill_value is
ak.full_like(ak.Array([["a", "b"], ["c", "d"]]), "a")
# >> <Array [['a', 'a'], ['a', 'a']] type='2 * var * string'>

ak.full_like(ref, [])
# >> ValueError: could not broadcast input array from shape (0,) into shape (4,)

ak.full_like(ref, None)
# >> TypeError: Encountered a None value, but None conversion/promotion is disabled

# this doesn't work because `is_array_like` is false for any `ak.Array`.
# I could allow this to work, but in principle I'm relying on nplikes's correct broadcasting
# implementation for `nplike.full_like`, which would go wrong for var-len `ak.Arrays`
ak.full_like(ref, ak.Array([2, 3]))
# >> ValueError: could not broadcast input array from shape (2,) into shape (4,)

so basically only 0-d number-types (e.g. int or float) and strings are currently usable.

This PR adds the following new possibilities:

import awkward as ak
import numpy as np

ref = ak.Array(np.ones((2, 2)))

ak.full_like(ref, np.array([2, 3]))
# >> <Array [[2, 3], [2, 3]] type='2 * 2 * float64'>

# also other backend because the check is for any `is_array_like`:
import jax.numpy as jnp

ak.jax.register_and_check()

jax_arr = ak.full_like(ak.to_backend(ref, "jax"), jnp.array([2, 3]))
print(jax_arr)
# >> <Array [[2.0, 3.0], [2.0, 3.0]] type='2 * 2 * float32'>
print(jax_arr.layout.backend)
# >> <awkward._backends.jax.JaxBackend at 0x1095f6860>

# doesn't work if the backend don't match
ak.full_like(ref, jnp.array([2, 3]))
# >> ValueError: cannot operate on arrays with incompatible backends. Use #ak.to_backend to coerce the arrays to the same backend

Is it the case right now that passing an array-like becomes a regular-length list and any other sequence becomes a variable-length list?

Not sure if I understand you correctly, but the only difference is that in the array-like passing case the array-like will be broadcasted into the reference array in the same way as numpy would. This only works for rectangular arrays.

Most of these advanced use cases can actually be achieved through something like:

full_like = ak.Array([2, 3]) * ak.ones_like(ref)

That would do the correct broadcasting, but comes at the price of an unnecessary operation (multiplication here).

jpivarski commented 3 days ago

What I was missing was that this is ak.full_like, rather than ak.fill_none. What I said about filling with [] works for ak.fill_none.

scikit-hep / awkward

feat: allow `np.array`s in `ak.full_like` as fill_value #3315