Open darribas opened 2 years ago
Probably a bug, lemme check!
Hmm... da2wsp()
doesn't deal with standardization, so that option is ignored when sparse=True
since it is propagated down to the da2W()
and da2WSP()
functions using splatting (**kwargs
).
Should be a one-line fix either in da2WSP
or directly in the .from_xarray()
method... probably best in da2WSP()
. In theory.... we should have a centralized "standardizer" mixin class 😄
Probably a bug, lemme check!
whoops, should've left this for you @MgeeeeK, sorry :/ just saw in morning email & recalled the standardization options...
int8
is hardcoded here:
I think that makes sense as a default as it's more memory efficient, but I thought that'd convert directly into other types (eg. float64
) if multiplied. Perhaps that is not occurring?
I think the ideal behaviour is that by default the binary weights are made of int8
, but if we transform the weights (e.g. row-standardise or spatial lag even) it'd accept the operations.
whoops, should've left this for you MgeeeeK, sorry :/ just saw in morning email & recalled the standardization options...
no problem! I think you are correct, sparse weights does not support transformations rn. (afaiu)
I think the ideal behaviour is that by default the binary weights are made of int8, but if we transform the weights (e.g. row-standardise or spatial lag even) it'd accept the operations.
that makes sense, will look into it
A similar issue affects lat2SW
https://github.com/pysal/libpysal/blob/master/libpysal/weights/util.py#L1248
Would it make sense to have the dtype
optionally chosen by the user with a default to int8
and/or an 'auto'
option that picks up the dtype
of the DataArray
? This might not be very efficient at scale but, as it turns out, it does make computations down the line easier (e.g. numba
in the LISA complains if the W
is expressed in float32
and the y
is expressed as float64
). Should we maybe even default to float64
and if you know what you're doing you can set it to int8
if you want things to run lean?
I've given it a bit more thought and fleshed out some ideas in this notebook. There's a quick sketch of how we could go about aligning the dtypes
of y
and w
when running things like Moran_Local
. It relies on a function aligner
that takes the following arguments:
'''
y
w
how : str/dtype
[Optional. Default='y'] Alignment policy:
- 'y': use `y.dtype`
- 'w': use `dtype` in `w`
- `dtype`: different `dtype` target
'''
I'll copy here my current thinking (from the bottom of the notebook):
aligner
could be embedded in Moran_Local
and all methods that rely on numba
, but this would need to be added after any transformations and on each of the classes that currently rely on accelerated randomisationy.dtype
or float64
) and leave flexibility as an option if folks need itw
is a large matrix, the approach taken in _convert_w_dtype
might not be feasible/desirable. Do we have other ways?aligner
was called within something like Moran_Local
, warnings would need to be raised so the user is aware. In fact, one option in Moran_local
could select what to do, with the option of raising an error if they're different, or converting dtype
s automatically.I think there's two aspects to this issue, one inmediate that we could fix along the lines suggested, and a longer term one that I think should be addressed in how W
objects are stored and manipulated, perhaps in a new iteration of the current W
/WSP
objects.
The recommended
WSP
builder in the raster section ofweights
seems to returnWSP
objects that can only haveint8
values. This makes if fail, for example, on computations that require products like LISA statistics.From the raster demo:
w_queen
is a fullW
and seems to work as expected:If we build it straight into
WSP
, which is much faster and efficient (and the current default):Any ideas why it does not switch to floats to allow for the weights to adjust?
cc' @MgeeeeK as they might have some suggestions since they worked on this intensively.