metagraph-dev / metagraph

Multi-target API for graph analytics with Dask
https://metagraph.readthedocs.io/en/latest/
Apache License 2.0
26 stars 7 forks source link

Translators don't know about enough numpy dtypes #15

Closed eriknw closed 4 years ago

eriknw commented 4 years ago

For example,

mg.translate(
    ss.csr_matrix([[1, 0], [0, 0]]),
    mg.resolver.types.SparseMatrix.GrblasMatrixType
)

complains about not finding numpy.longlong. There may be other datatypes that we should be able to handle, but don't.

eriknw commented 4 years ago

Okay, looks like we have most of the dtypes we're likely to care about. See:

In [5]: set(np.typeDict.values())
Out[5]:
{numpy.bool_,
 numpy.bytes_,
 numpy.complex128,
 numpy.complex256,
 numpy.complex64,
 numpy.datetime64,
 numpy.float128,
 numpy.float16,
 numpy.float32,
 numpy.float64,
 numpy.int16,
 numpy.int32,
 numpy.int64,
 numpy.int8,
 numpy.longlong,
 numpy.object_,
 numpy.str_,
 numpy.timedelta64,
 numpy.uint16,
 numpy.uint32,
 numpy.uint64,
 numpy.uint8,
 numpy.ulonglong,
 numpy.void}

And there are a few miscellaneous dtypes:

In [6]: {val for val in vars(np).values() if np.issctype(val)} - set(np.typeDict.values())
Out[6]:
{bool,
 complex,
 float,
 int,
 numpy.character,
 numpy.complexfloating,
 numpy.flexible,
 numpy.floating,
 numpy.generic,
 numpy.inexact,
 numpy.integer,
 numpy.number,
 numpy.record,
 numpy.signedinteger,
 numpy.unsignedinteger,
 str}

Relatedly, if we're going to have dtypes in metagraph, why not simply use numpy dtypes? Alternatively, if we'd like to use strings for now, then can we choose the set of strings that we like from np.typeDict?

jim22k commented 4 years ago

why not simply use numpy dtypes?

We ideally want dtypes that are generic across all implementations -- GraphBLAS, numpy, cugraph, etc. Otherwise we have conversion issues when, for example, we try to store 'complex128' inside cugraph and it laughs at us.

eriknw commented 4 years ago

I disagree. We may have conversion issues regardless of how we choose to represent our dtypes. We don't need to support all numpy dtypes for everything or even for anything.

eriknw commented 4 years ago

I can take care of this. @jim22k, will my doing so strongly interact with what you're doing?