rasterio / affine

Affine transformation matrices
BSD 3-Clause "New" or "Revised" License
156 stars 28 forks source link

Affine class no longer compatible with delayed operations in Dask 2022.8.1 #79

Closed fbunt closed 1 year ago

fbunt commented 1 year ago

A recent change (dask/dask#9361) in how dask handles arguments that are instances or subclasses of namedtuple prevents Affine objects from being passed to delayed operations such as dask.array.map_blocks. Dask now unpacks the namedtuples and then tries to repack them before providing them to the delayed function at compute time. This breaks for Affine objects since they unpack to the full 9 coefficients but Affine.__new__ only takes the first 6 coefficients. The result is

TypeError: __new__() takes 7 positional arguments but 10 were given

when dask graphs are computed.

My current solution is to pass the coefficients in place of the instance but it would be great to simply pass Affine objects again.

fbunt commented 1 year ago

Quick example:

import dask.array as da
import numpy as np
from affine import Affine

def dummy(x, aff):
    print(aff)
    return x

x = da.from_array(np.ones((4, 4)), chunks=(2, 2))
a = Affine(1.0, 0.0, 0.0, 0.0, -1.0, 4.0)
da.map_blocks(dummy, x, aff=a).compute()
sgillies commented 1 year ago

@fbunt thanks for reporting this! I'm not sure what to do about this at the moment. Being a namedtuple was supposed to be only an implementation detail for affine, but clearly that was not much of a strategy.

groutr commented 1 year ago

If namedtuple is only an implementation detail of affine, would using something like array.array (https://docs.python.org/3/library/array.html) be a good alternative?

sgillies commented 1 year ago

@groutr well, I was thinking to use attrs or a data class instead.

groutr commented 1 year ago

My understanding was that affine sought to be free of dependencies outside of Python's standard library. Data classes are a good idea. Affine doesn't seem to specify any minimum version of Python, but using data classes would mean Python >=3.7 (which probably isn't a big deal). Slotted data classes weren't introduced until Python 3.10, so each data class instance would have an instance dictionary. But they can be frozen and have named attributes so mimic the behavior of a namedtuple.

The suggestion to use array.array was because it could be used to enforce the type of all the values, casting as necessary. It also preserves some of the behavior of a named tuple such as slicing and unpacking. Additionally, array.array would play nicely with numpy's support of the buffer protocol (ie, creating an array view of a transform without having to make a copy, but ). Arrays have their drawbacks, mainly mutability. Care would have to be taken in regards to mutability as changing the numpy array changes the array).

Ultimately, you know affine's design goals and future far better than I do.

artttt commented 1 year ago

leaving a comment here to bump this issue as I just found the same bug.

without looking deeply or testing may I suggest making g,h,i optional parameters of new may solve this issue?

    def __new__(cls, a, b, c, d, e, f, g=0.0, h=0.0, i=1.0):
        """Create a new object
        Parameters
        ----------
        a, b, c, d, e, f, g, h, i : float (g, h, i are optional)
            Elements of an augmented affine transformation matrix.
        """
        mat3x3 = [x * 1.0 for x in [a, b, c, d, e, f, g, h, i]]
        return tuple.__new__(cls, mat3x3)
sgillies commented 1 year ago

@artttt that's a great suggestion! I'm trying it out in #92.

@fbunt any chance you could try the code in PR #92 out?