Closed nitinkgp23 closed 7 years ago
This is weird. I would have assumed that pandas.Series([1, 2, 3], dtype=str)
would convert each element of the input list to a string, like it converts all elements to floats if you pass dtype=float
to the Series
constructor. But this is apparently not true:
import pandas as pd
int_list = [1, 2, 3]
s1 = pd.Series(int_list)
s2 = pd.Series(int_list, dtype=float)
s3 = pd.Series(int_list, dtype=str)
s4 = pd.Series(int_list, dtype='U')
print('pandas.Series elements:')
print(' s1:', type(s1[0]))
print(' s2:', type(s2[0]))
print(' s3:', type(s3[0]))
print(' s4:', type(s4[0]))
has the following output:
pandas.Series elements:
s1: <class 'numpy.int64'>
s2: <class 'numpy.float64'>
s3: <class 'int'>
s4: <class 'int'>
This is not a behavior I would expect. Especially because I'm used to NumPy arrays which do the conversion correctly:
import numpy as np
int_list = [1, 2, 3]
a1 = np.array(int_list)
a2 = np.array(int_list, dtype=float)
a3 = np.array(int_list, dtype=str)
a4 = np.array(int_list, dtype='U')
print('numpy.array elements:')
print(' a1:', type(a1[0]))
print(' a2:', type(a2[0]))
print(' a3:', type(a3[0]))
print(' a4:', type(a4[0]))
numpy.array elements:
a1: <class 'numpy.int64'>
a2: <class 'numpy.float64'>
a3: <class 'numpy.str_'>
a4: <class 'numpy.str_'>
As a workaround you could use Series().astype(str)
which seems to do the string conversion correctly:
a = np.array(int_list).astype(str)
s = pd.Series(int_list).astype(str)
print('using astype:')
print(' s:', type(s[0]))
print(' a:', type(a[0]))
using astype:
s: <class 'str'>
a: <class 'numpy.str_'>
Fixes #8 . But, after doing this,
drms.to_datetime([1])
gives1970-01-01 00:00:00.000000001
s output. Earlier, the code used to break. ~The correct output should beNaT
~ . The output is correct as, when an integer value is passed topd.to_datetime()
, the function recognises it as the no. of seconds elapsed since the time.