rogerlew / pyvttbl

Automatically exported from code.google.com/p/pyvttbl
10 stars 8 forks source link

issues with multi-dimensional pivot row and column names #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Potentially the rnames attribute of a pyvttbl can be longer than the data of 
the pyvttbl, when a multi-dimensional pivot is performed on a dataframe.  The 
resulting pyvttbl will not write out or print and the row labels may be 
mismatched with the data of the pyvttbl.

Please note this error was discovered on a much more complex data set, but the 
example provided is a simple case for illustration purposes.

What steps will reproduce the problem?
1. load a table into a dataframe in which there is not data for every 
permutation of two fields - example:

>>> df = DataFrame()
>>> df.insert({'id':0,'Name':'name1','Year':2010,'member':'Y'})
>>> df.insert({'id':1,'Name':'name1','Year':2011,'member':'N'})
>>> df.insert({'id':2,'Name':'name2','Year':2011,'member':'Y'})
>>> print df

member   Name    id   Year 
==========================
Y        name1    0   2010 
N        name1    1   2011 
Y        name2    2   2011

2. pivot the dataframe using two row labels or column labels - example:

>>> my_pivot = df.pivot('id',rows = ['Name','Year'], cols = ['member'], 
aggregate='count')

What is the expected output? What do you see instead?

this resulting pyvt table will not print or write because it has more row names 
than date rows - example 

>>>print my_pivot

Traceback (most recent call last):
  File "<pyshell#76>", line 1, in <module>
    print my_pivot
  File "C:\Python27\lib\site-packages\pyvttbl-0.3.6.7-py2.7.egg\pyvttbl\pyvttbl.py", line 2355, in __str__
    self[i] +
IndexError: list index out of range

>>> len(my_pivot.rnames)
4
>>> len(my_pivot)
3

>>> for name in my_pivot.rnames:
    print name

[('Name', u'name1'), ('Year', 2010)]
[('Name', u'name1'), ('Year', 2011)]
[('Name', u'name2'), ('Year', 2010)]
[('Name', u'name2'), ('Year', 2011)]

>>> for row in my_pivot:
    print row

[0, 1]
[1, 0]
[0, 1]

In the above example, the row name  - [('Name', u'name2'), ('Year', 2010)] - is 
not necessary or reflective of the original data.

What version of the product are you using? On what operating system?
pyvttbl version 0.3.6.7
Windows XP
Python 2.7.2

Please provide any additional information below.

Thanks for your help!

We really like this python module!

Original issue reported on code.google.com by ryan.b.b...@gmail.com on 10 May 2012 at 4:15

GoogleCodeExporter commented 9 years ago
I have the same problem.

Original comment by jarede.s...@gmail.com on 15 May 2012 at 1:21

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 49d3b1bcf880.

Original comment by roger...@gmail.com on 17 May 2012 at 11:38

GoogleCodeExporter commented 9 years ago
Hi folks,

Thanks so much for the detailed description. It should be fixed for when rnames 
exceeds the actual number of rows and for when cnames exceeds the actual number 
of columns.

Roger

Original comment by roger...@gmail.com on 17 May 2012 at 11:54