thouis / numpy-trac-migration

numpy Trac to github issues migration
2 stars 3 forks source link

genfromtxt and unicode strings (Trac #2124) #5920

Open numpy-gitbot opened 11 years ago

numpy-gitbot commented 11 years ago

Original ticket http://projects.scipy.org/numpy/ticket/2124 on 2012-05-02 by trac user anntzer, assigned to unknown.

With bytes (in Python 3-speak) fields, genfromtxt(dtype=None) sets the sizes of the fields to the largest number of chars (npyio.py line 1596), but it doesn't do the same for unicode fields, which is a pity. See example:

import io, numpy as np
s = io.BytesIO()
s.write(b"abc 1\ndef 2")
s.seek(0)
t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes})
print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', '<i8')]
s.seek(0)
t = np.genfromtxt(s, dtype=None, converters={0: lambda s: s.decode("utf-8")})
print(t, t.dtype) # -> [(_, 1) (_, 2)] [('f0', '<U0'), ('f1', '<i8')]

I tried to change npyio.py around line 1600 to add that but it didn't work; from my limited understanding the problem comes earlier, in the way StringBuilder is defined(?).