Closed goodboy closed 8 years ago
Couple of notes after digging into this:
shmarry
doen't support unicode as per the docs on multiprocessing.RawArray
so if we want to keep the shared numpy
array stuff it would seem we have to use Bytes. pandas
"encoding" (type coercing?) problems are stemming from pydata/pandas#9712 where when a csv data store is written it hilariously keeps the b
bytes prefix. This ends up causing problems with round tripping (which is done implicitly when reading the entire contents of a DataStorer
: in mem + on disk) since pd.read_csv
then parses the b
as part of the data point.
Got it mostly there just some outstanding oddness trying to get
pandas
to work with unicode strings.pandas
doesn't seem like it's supportingnumpy
dtypes fully an example being that I can't use the'<U'
from Array-protocol type strings. On top of thatshmarray
doesn't seem to play well with passing built-in Python types tonp.dtype
.Gonna hack on it some more.