robdmc / pandashells

:panda_face: Bringing the python data stack to the shell prompt
Other
787 stars 26 forks source link

Limit on select row? #24

Closed jungle-boogie closed 9 years ago

jungle-boogie commented 9 years ago

Hello,

Is there a limit of two (2) for the 'select by row' with the df.h function?

p.df -h shows:

* Select by row p.example_data -d tips \ | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' -o table

I can do two just fine, but with three (3), it says:

Traceback (most recent call last): File "/usr/local/bin/p.df", line 9, in <module> load_entry_point('pandashells==0.1.4', 'console_scripts', 'p.df')() File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 223, in main df = process_command(args, cmd, df) File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 112, in process_command df = execute(cmd, scope_entries={'df': df}, retval_name='df') File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 62, in execute exec(cmd, scope) File "<string>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/ops.py", line 614, in wrapper res = na_op(values, other) File "/usr/local/lib/python2.7/site-packages/pandas/core/ops.py", line 568, in na_op raise TypeError("invalid type comparison") TypeError: invalid type comparison

My cli input: cat data.csv | p.df 'df[df.mid=="2600"]' 'df[df.date=="07/02/2015"]' 'df[df.type=="Credit Card Authorize"]' -o table

Is there something obvious that I'm doing wrong here?

Using: pandashells (0.1.4)

robdmc commented 9 years ago

Hi.

There is no limit. My guess is that something funky is going on with your date comparison.

jungle-boogie commented 9 years ago

Hi robmc,

I'll test this more and if I have the same results, paste some sample data.

Thanks

jungle-boogie commented 9 years ago

Hi,

My guess is that something funky is going on with your date comparison.

I don't think its the date at all...

Look carefully:

cat data.csv | p.df 'df[df.dba=="jungle"]' 'df[df.date=="07/03/2015"]' 'df[df.mid=='2601']' -o table

the digits must be in single quotes.

p.example_data -d tips | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' 'df[df.total_bill=="9.60"]' -o table

Traceback (most recent call last): File "/usr/local/bin/p.df", line 9, in <module> load_entry_point('pandashells==0.1.4', 'console_scripts', 'p.df')() File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 223, in main df = process_command(args, cmd, df) File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 112, in process_command df = execute(cmd, scope_entries={'df': df}, retval_name='df') File "/usr/local/lib/python2.7/site-packages/pandashells/bin/p_df.py", line 62, in execute exec(cmd, scope) File "<string>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/ops.py", line 614, in wrapper res = na_op(values, other) File "/usr/local/lib/python2.7/site-packages/pandas/core/ops.py", line 568, in na_op raise TypeError("invalid type comparison") TypeError: invalid type comparison

p.example_data -d tips | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' 'df[df.total_bill=='9.60']' -o table

total_bill tip sex smoker day time size

9.6 4 Female Yes Sun Dinner 2

So it does not matter if this is a whole number or some decimal number.

And with my real data and using a single quote, I get the desired results.

Thanks

robdmc commented 9 years ago

This statement does not work because 9.60 is a floating point number, and by enclosing it it double quotes, you are asking for a string. p.example_data -d tips | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' 'df[df.total_bill=="9.60"]' -o table


This statement works, but it does so by accident.
p.example_data -d tips | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' 'df[df.total_bill=='9.60']' -o table

You are essentially concatinating the strings 'df[df.total_bill==' and '9.60]' which evaluates to a valid expression. The more appropriate way of doing this would be p.example_data -d tips | p.df 'df[df.sex=="Female"]' 'df[df.smoker=="Yes"]' 'df[df.total_bill==9.60]' -o table

Bash can get a little tricky with the way it uses quotes. I'd recommend looking into that. If this helps, the following two statements are equivalent. p.example_data -d tips | p.df 'df[df.sex=="Female"]' p.example_data -d tips | p.df "df[df.sex=='Female']"

However, the second one can get you into trouble because bash can do string interpolation on you when you don't want it.

jungle-boogie commented 9 years ago

Hi robdmc,

This statement does not work because 9.60 is a floating point number,

Yes, I had thought it was related to the floating point numbers.

Thank you for your detailed reply and your expert analysis on the proper way to handle floating point numbers.

I don't use bash but I'll keep your advice in mind.

Thank you for writing pandashells, I look forward to the updates you may make to it and my time saving uses because of it.