thierry-martinez / pyml

OCaml bindings for Python
BSD 2-Clause "Simplified" License
187 stars 31 forks source link

dtype of a numpy array #81

Closed dlindsaye closed 2 years ago

dlindsaye commented 2 years ago

First off .. an excellent package. Thank you!

I'm wondering if there's an obvious way to retrieve the dtype of a numpy array and then use that to select the kind in a call to Numpy.to_bigarray?

andiejs commented 2 years ago

I have the same question ... digging through the code, it doesn't look like the dtype is accessible. Although Numpy.to_bigarray is testing against the dtype kind internally, I don't see a way to get it in advance.

thierry-martinez commented 2 years ago

Sorry for having left this unanswered for so long. The interface of the Numpy module is quite limited because pyml is supposed to work with OCaml <4.00, and it is hard to find a more expressive interface without relying on GADTs. Anyway, I will try to find a better solution for next releases of pyml, and meanwhile, you can use the code below.

type bigarray_of_pyarray =
  C : {
      kind: ('a, 'b) Bigarray.kind;
      layout: 'c Bigarray.layout;
      array: ('a, 'b, 'c) Bigarray.Genarray.t
    } -> bigarray_of_pyarray [@unwrap]

external bigarray_of_pyarray_internal: Py.Object.t -> Py.Object.t
  -> bigarray_of_pyarray
  = "bigarray_of_pyarray_wrapper"

let bigarray_of_pyarray obj =
  bigarray_of_pyarray_internal (Py.Array.numpy_api ())  obj

let () =
  Py.initialize ();
  assert (Py.Import.try_import_module "numpy" <> None);
  let m = Py.Import.add_module "test" in
  let callback arg =
    let C { kind; _ } = bigarray_of_pyarray arg.(0) in
    begin match kind with
    | Bigarray.Float32 -> ()
    | _ -> assert false
    end;
    Py.none in
  Py.Module.set m "callback" (Py.Callable.of_function callback);
  assert (Py.Run.simple_string "
from test import callback
import numpy
callback(numpy.array([[0.12,1.23,2.34,3.45],[-1.,numpy.nan,1.,0.]], dtype=numpy.float32))
")
thierry-martinez commented 2 years ago

Code citation is broken in my previous message, sorry. I try again:

type bigarray_of_pyarray =
  C : {
      kind: ('a, 'b) Bigarray.kind;
      layout: 'c Bigarray.layout;
      array: ('a, 'b, 'c) Bigarray.Genarray.t
    } -> bigarray_of_pyarray [@unwrap]

external bigarray_of_pyarray_internal: Py.Object.t -> Py.Object.t
  -> bigarray_of_pyarray
  = "bigarray_of_pyarray_wrapper"

let bigarray_of_pyarray obj =
  bigarray_of_pyarray_internal (Py.Array.numpy_api ())  obj

let () =
  Py.initialize ();
  assert (Py.Import.try_import_module "numpy" <> None);
  let m = Py.Import.add_module "test" in
  let callback arg =
    let C { kind; _ } = bigarray_of_pyarray arg.(0) in
    begin match kind with
    | Bigarray.Float32 -> ()
    | _ -> assert false
    end;
    Py.none in
  Py.Module.set m "callback" (Py.Callable.of_function callback);
  assert (Py.Run.simple_string "
from test import callback
import numpy
callback(numpy.array([[0.12,1.23,2.34,3.45],[-1.,numpy.nan,1.,0.]], dtype=numpy.float32))
")
andiejs commented 2 years ago

Thanks Thierry, that helps! I do see the challenge of working without GADTs here. In the case I'm working on (reading wav data), whatever the python dtype I'd want to immediately convert to normalized floats for caml output. So knowing the dtype will let me select a function to operate on the Py.Object.t with a single output Bigarray type!

thierry-martinez commented 2 years ago

I just committed https://github.com/thierry-martinez/pyml/commit/e62125f53e32af2d861e1a9a27d69b0ba79d6808 which introduces a new function Numpy.to_bigarray_k for converting Numpy arrays to bigarrays without previously knowing the kind and the layout of the array.