robdmc / pandashells

:panda_face: Bringing the python data stack to the shell prompt
Other
788 stars 26 forks source link

p.regplot errors with dates on x-axis #20

Closed kmatt closed 9 years ago

kmatt commented 9 years ago

p.plot seems to work with an x-axis containing dates in strings (YYYY-MM-DD), but p.regplot does not, even if the column is cast to a datetime:

p.df -i csv --names dt count | p.df "df['dt'] = pd.to_datetime(df['dt'])" | p.regplot -x dt -y count

Traceback (most recent call last):
  File "//anaconda/envs/py34/bin/p.regplot", line 9, in <module>
    load_entry_point('pandashells==0.1.4', 'console_scripts', 'p.regplot')()
  File "//anaconda/envs/py34/lib/python3.4/site-packages/pandashells/bin/p_regplot.py", line 108, in main
    coeffs = np.polyfit(x, y, args.order[0])
  File "//anaconda/envs/py34/lib/python3.4/site-packages/numpy/lib/polynomial.py", line 543, in polyfit
    x = NX.asarray(x) + 0.0
TypeError: Can't convert 'float' object to str implicitly
kmatt commented 9 years ago

Apparently numpy.polyfit does not accept dates, but will use matplotlib dates. Perhaps to_pydatetime() is an option here?

robdmc commented 9 years ago

Hi. Thanks for your comment. Since p.regplot is focused on doing regression, I'm not sure I understand how you would regress against a datetime value. Wouldn't you need time expressed in an explicit unit (e.g. second, day, etc)? In that case I would probably just transform dates to number of units (sec, min, hr, day) after some start time and do the regression from there. Does this make sense?

kmatt commented 9 years ago

It does - I was looking at the format of the x-axis with date labels. Converting the dates to seconds since epoch displayed the chart correctly, although without the actual date values on the axis.

robdmc commented 9 years ago

Yea. I can't think of a good way to both do the regression and plot dates on x. If you can think of a way, I'd be all ears.