gzuidhof / zarr.js

Javascript implementation of Zarr
https://guido.io/zarr.js
Apache License 2.0
132 stars 23 forks source link

Support for float16 #127

Closed keller-mark closed 2 years ago

keller-mark commented 2 years ago

Background

Without support for the f2 dtype in Zarr.js, I need check all of my arrays on the python side to ensure that they will work with zarr.js downstream:

if arr.dtype.kind == 'f' and arr.dtype.itemsize == 2:
    arr = arr.astype('<f4')

Feature request

While JS does not have a Float16Array class, could Zarr.js load remote <f2 and >f2 arrays into Float32Arrays? In other words, is there something preventing adding the following lines to https://github.com/gzuidhof/zarr.js/blob/master/src/nestedArray/types.ts#L32 Perhaps only supported when mode: "r" for readOnly mode?

const DTYPE_TYPEDARRAY_MAPPING = {
  // ...
+  '<f2': Float32Array,
  '<f4': Float32Array,
  '<f8': Float64Array,
  '>b': Int8Array,
  '>B': Uint8Array,
  '>u1': Uint8Array,
  '>i1': Int8Array,
  '>u2': Uint16Array,
  '>i2': Int16Array,
  '>u4': Uint32Array,
  '>i4': Int32Array,
+  '>f2': Float32Array,
  '>f4': Float32Array,
  '>f8': Float64Array
};
manzt commented 2 years ago

Thanks for your patience in my response. TL;DR - It isn't possible to view a contiguous piece of memory that is f2 as f4 and have anything usable.

Unfortunately, I don't think it is as simple as adding more mappings to DTYPE_TYPEDARRAY_MAPPING. A TypedArray works by providing a an array-like view to an underlying ArrayBuffer. In zarr.js, each array "chunk" is decompressed into a raw ArrayBuffer and then the corresponding TypedArray is used to provide a view of that underlying binary data.

Each element in a Float32Array is 4-bytes of the underlying ArrayBuffer viewed as a 32-bit IEEE floating point number. It isn't until you try to access the data from this view (e.,g., arr[0] or Array.from(arr)) that the value(s) is/are coerced into a JS Number(s). TypedArrays do a lot of the hard work for us because they provide this no-copy abstraction over the underlying binary data. Otherwise we'd need to manually parse the binary buffers ourselves into JS Array<number>.

The reason you cannot simply view f2 as f4 is because each element requires different numbers of bytes with different bit layouts (as defined by IEEE),

float 16 image

float 32 image

so viewing a f2 buffer as a Float32Array either won't work (buffer length must be a multiple of 4 for float32) or give you a Float32Array that is half the length and values which aren't useful. This can be illustrated with taking a Float64Array view of the underlying buffer of a Float32Array:

let f32 = Float32Array([0, 1, 2, 3]);
let f64 = Float64Array(f32.buffer);
console.log(f64) // Float64Array [ 0.0078125, 32.00000762939453 ]
keller-mark commented 2 years ago

Thank you for the explanation! I did not realize that the zarr.js chunks were being passed directly to the TypedArray via ArrayBuffer. I wonder if https://github.com/petamoriken/float16 could be used in this case (tho I have never tried it, just came across the repo), but of course would add a dependency

manzt commented 2 years ago

Published in v0.6.0!

https://guido.io/zarr.js/#/advanced/float16

TheJeran commented 3 months ago

Published in v0.6.0!

https://guido.io/zarr.js/#/advanced/float16

Is there any reason why this method shouldn't work? Because I am trying it and still getting the unsupported error. My imports look like this

import { Float16Array } from "@petamoriken/float16";
// !Important! Make sure this global is set _before_ importing Zarr.js
globalThis.Float16Array = Float16Array;

import type { Float16ArrayConstructor } from "@petamoriken/float16";

declare global {
  var Float16Array: Float16ArrayConstructor;
}

import { HTTPStore, openArray } from "zarr";
import {slice as zarrSlice}  from "zarr";