wireservice / agate

A Python data analysis library that is optimized for humans instead of machines.
https://agate.readthedocs.io
MIT License
1.17k stars 155 forks source link

.normalize() doesn't work with .where() #691

Closed critmcdonald closed 3 years ago

critmcdonald commented 7 years ago

I have a case where I can't filter a table after it has been run through .normalize(), getting an error IndexError: tuple index out of range.

It is a repeatable problem, but it is not always a problem. I've proven that I can also filter a simple normalized table. So, while this could be user error, I don't think so, but I would be happy to be proven wrong.

I have a repo that reproduces the problem in a notebook. But I also have an example where I can filter a normalized table.

gutte commented 6 years ago

I also get IndexError: tuple index out of range when performing certain operations on a table that has been normalized using .normalize(). It seems to be something about how the column names are indexed.

Example: table = agate.Table.from_csv('testinput.csv') norm_table = table.normalize('student',['course1','course2','course3','course4']) selection = norm_table.where(lambda row: row['property'] == 'course1') # THIS GIVES AN ERROR

where testinput.csv: student,course1,course2,course3,course4 A,1,2,3,4 B,3,4,1,2 C,2,1,3,4 D,1,4,3,2 E,3,4,1,2

My reason for thinking that its not a user issue is that the problem can be worked around by temporarily writing the table to file and reading again, hence "fixing" the indexing:

norm_table.to_csv('temp.csv') norm_table = agate.Table.from_csv('temp.csv') selection = norm_table.where(lambda row: row['property'] == 'course1') # NO ERROR!

jpmckinney commented 3 years ago

I think the issue is that normalize adds new rows, but it doesn't fill in row names for the new rows, so you end up getting an index error.

jpmckinney commented 3 years ago

@gutte Thank you for sharing the easy-to-replicate code!