sunpy / drms

Access HMI, AIA and MDI data with Python from public JSOC DRMS servers
https://docs.sunpy.org/projects/drms/en/stable/
BSD 2-Clause "Simplified" License
22 stars 23 forks source link

Allow for empty input lists #9

Closed nitinkgp23 closed 7 years ago

nitinkgp23 commented 7 years ago

Fixes #8 . But, after doing this, drms.to_datetime([1]) gives 1970-01-01 00:00:00.000000001 s output. Earlier, the code used to break. ~The correct output should be NaT~ . The output is correct as, when an integer value is passed to pd.to_datetime(), the function recognises it as the no. of seconds elapsed since the time.

kbg commented 7 years ago

This is weird. I would have assumed that pandas.Series([1, 2, 3], dtype=str) would convert each element of the input list to a string, like it converts all elements to floats if you pass dtype=float to the Series constructor. But this is apparently not true:

import pandas as pd

int_list = [1, 2, 3]
s1 = pd.Series(int_list)
s2 = pd.Series(int_list, dtype=float)
s3 = pd.Series(int_list, dtype=str)
s4 = pd.Series(int_list, dtype='U')

print('pandas.Series elements:')
print('  s1:', type(s1[0]))
print('  s2:', type(s2[0]))
print('  s3:', type(s3[0]))
print('  s4:', type(s4[0]))

has the following output:

pandas.Series elements:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'int'>
  s4: <class 'int'>

This is not a behavior I would expect. Especially because I'm used to NumPy arrays which do the conversion correctly:

import numpy as np

int_list = [1, 2, 3]
a1 = np.array(int_list)
a2 = np.array(int_list, dtype=float)
a3 = np.array(int_list, dtype=str)
a4 = np.array(int_list, dtype='U')

print('numpy.array elements:')
print('  a1:', type(a1[0]))
print('  a2:', type(a2[0]))
print('  a3:', type(a3[0]))
print('  a4:', type(a4[0]))
numpy.array elements:
  a1: <class 'numpy.int64'>
  a2: <class 'numpy.float64'>
  a3: <class 'numpy.str_'>
  a4: <class 'numpy.str_'>

As a workaround you could use Series().astype(str) which seems to do the string conversion correctly:

a = np.array(int_list).astype(str)
s = pd.Series(int_list).astype(str)

print('using astype:')
print('  s:', type(s[0]))
print('  a:', type(a[0]))
using astype:
  s: <class 'str'>
  a: <class 'numpy.str_'>