reneshbedre / bioinfokit

Bioinformatics data analysis and visualization toolkit
MIT License
333 stars 77 forks source link

Tukey HSD function throws a TypeError, even with example code #69

Open kuoirene opened 9 months ago

kuoirene commented 9 months ago

Hello!

I followed the example code and have the following elements in the code. However, I keep getting the error I posted below with traceback from the line for tukey_hsd function. It keeps returning a TypeError: Could not convert ['AAAAA'] to numeric. For some reason it is returning the column name in the amount of times there are variables and I'm unsure what the problem is. I specifically tried using the example code only to see if I missed something in my version of the code but it also throws the same error. I've provided both the code and the error traceback and hope @reneshbedre you can help with this! Your code and tutorial have otherwise been amazing and super helpful. Thank you!

import pandas as pd df = pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.txt", sep="\t") df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D']) df_melt.columns = ['index', 'treatments', 'value']

from bioinfokit.analys import stat res = stat() res.tukey_hsd(df=df_melt, res_var='value', xfac_var='treatments', anova_model='value ~ C(treatments)') res.tukey_summary

Traceback (most recent call last): res.tukey_hsd(df=df_melt, res_var='value', xfac_var='treatments', anova_model='value ~ C(treatments)') File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/bioinfokit/analys.py", line 882, in tukey_hsd mult_group, mult_group_count, sample_size_r = analys_general.get_list_from_df(df, xfac_var, res_var, 'get_dict') File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/bioinfokit/analys.py", line 421, in get_list_from_df mult_group[ele] = df[df[xfac_var] == ele].mean().loc[res_var] File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/frame.py", line 11335, in mean result = super().mean(axis, skipna, numeric_only, kwargs) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/generic.py", line 11984, in mean return self._stat_function( File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/generic.py", line 11941, in _stat_function return self._reduce( File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/frame.py", line 11204, in _reduce res = df._mgr.reduce(blk_func) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/internals/managers.py", line 1459, in reduce nbs = blk.reduce(func) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/internals/blocks.py", line 377, in reduce result = func(self.values) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/frame.py", line 11136, in blk_func return op(values, axis=axis, skipna=skipna, kwds) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/nanops.py", line 147, in f result = alt(values, axis=axis, skipna=skipna, kwds) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/nanops.py", line 404, in new_func result = func(values, axis=axis, skipna=skipna, mask=mask, kwargs) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/nanops.py", line 720, in nanmean the_sum = _ensure_numeric(the_sum) File "/Users/Irene/Library/Python/3.9/lib/python/site-packages/pandas/core/nanops.py", line 1678, in _ensure_numeric raise TypeError(f"Could not convert {x} to numeric") TypeError: Could not convert ['AAAAA'] to numeric

EmilieDel commented 8 months ago

Downgrading pandas (to a version smaller than 2, personally I took pandas==1.5.3) helped!

amphioxus commented 6 months ago

Applying the changes to analys.py that @victormvy suggests in the pull request (https://github.com/reneshbedre/bioinfokit/pull/68) fixed this issue for me.