Closed Yuyoo closed 6 years ago
hi @Yuyoo, thanks for highlighting this issue. Please could you provide code to reproduce the problem? In Python 3, the following code returns the expected "duplicate columns" error for me:
# load sample data into a pandas dataframe
url="https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"
data=pd.read_csv(url)
# create duplicate columns
data = data.rename(index=str, columns={"MechVent": "Height", "Weight": "Height",
"SysABP":"Age", "ICU":"Age"})
# create table
overall_table = TableOne(data)
raises the expected error:
---------------------------------------------------------------------------
InputError Traceback (most recent call last)
<ipython-input-8-332ab7cb68f8> in <module>()
1 # create an instance of TableOne with the input arguments
2 # firstly, with no grouping variable
----> 3 overall_table = TableOne(data)
~/projects/tableone/tableone.py in __init__(self, data, columns, categorical, groupby, nonnormal, pval, pval_adjust, isnull, ddof, labels, sort, limit, remarks)
96 dups = data[columns].columns.get_duplicates()
97 if dups:
---> 98 raise InputError('Input contains duplicate columns: {}'.format(dups))
99
100 # if categorical not specified, try to identify categorical
InputError: Input contains duplicate columns: ['Age', 'Height']
Your suggested fix returns an error:
columns = data.columns.get_values()
data[columns].columns.get_duplicates().values.size
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-adf1753aef69> in <module>()
----> 1 data[columns].columns.get_duplicates().values.size
AttributeError: 'list' object has no attribute 'values'
The bug happened as:
TableOne(data)
C:\Users\Yuyoo\Anaconda3\lib\site-packages\tableone.py:96: FutureWarning: 'get_duplicates' is deprecated and will be removed in a future release. You can use idx[idx.duplicated()].unique() instead dups = data[columns].columns.get_duplicates() Traceback (most recent call last): File "D:/Tianchi/meinian2/code/table1_test.py", line 8, in <module> print(TableOne(data)) File "C:\Users\Yuyoo\Anaconda3\lib\site-packages\tableone.py", line 97, in __init__ if dups: File "C:\Users\Yuyoo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2002, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Sorry, I didnt examine my method in py2. In py2, data[columns].columns.get_duplicates() return type of list, and 'list' object has no attribute 'values', I think you can change it to "len(data[columns].columns.get_duplicates())". It is universal in both py2 and py3.
Okay, got it, thanks @Yuyoo. I now get this error after upgrading to pandas '0.23.0' (from '0.22.0').
Yeah, it will update pandas defaultly when pip install --upgrade tableone. I didnt get the error when i use the old version of tableone.
Yeah, bad timing because we just published a paper about the package! We'll get the issues fixed as soon as possible. This particular bug is fixed with:
# check for duplicate columns
dups = data[columns].columns[data[columns].columns.duplicated()].unique()
if not dups.empty:
raise InputError('Input contains duplicate columns: {}'.format(dups))
We'll work on the other issues shortly. Thanks again for raising this :)
Haha, its no problem, everything will be ok. You have done a good job, its convenient for us to do research. Best wish to you!
The following line also raises an error in Pandas 0.2.3:
grouped_data = pd.crosstab(data[self._groupby],data[v])
ValueError: Duplicated level name: "death",
assigned to level 1, is already used for level 0.
The error is raised when the _groupby
column matches v (in the case above, groupby='death' and v='death')
Odd, because it looks like this was fixed as a bug in Pandas at some point in the past: https://github.com/pandas-dev/pandas/issues/13279
Fixed in version 0.5.7. Thanks again :)
Hi,Tom and Alistair. Long time no see since Datathon in BeiJing in 2017. How have you been doing? I found a bug in tableone.py in the lastest version 0.5.6. Because of the difference of condition judgment in py2/py3, there is a bug in tableone.py in line 96. The bug can cause the error when using TableOne(data). In line 96, "data[columns].columns.get_duplicates()" returns "Index([], dtype='object')". In py3, Index([], dtype='object') could not be solved as False, and would throw a ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I have test it in py2, and it works well. I suggest that we can fix it by change "data[columns].columns.get_duplicates()" to "data[columns].columns.get_duplicates().values.size", or you can solved it in other way.