thombashi / pytablewriter

pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
https://pytablewriter.rtfd.io/
MIT License
611 stars 43 forks source link

Feature request: return pandas DataFrame #21

Closed hugovk closed 4 years ago

hugovk commented 4 years ago

Right now, PandasDataFrameWriter returns a string defining the source code to get a pandas DataFrame.

To turn that string into a DataFrame for further processing, we need to exec it:

>>> import pytablewriter
>>> writer = pytablewriter.PandasDataFrameWriter()
>>> writer.table_name = "example_table"
>>> writer.headers = ["int", "float", "str", "bool", "mix", "time"]
>>> writer.value_matrix = [
    [0,   0.1,      "hoge", True,   0,      "2017-01-01 03:04:05+0900"],
    [2,   "-2.23",  "foo",  False,  None,   "2017-12-23 45:01:23+0900"],
    [3,   0,        "bar",  "true",  "inf", "2017-03-03 33:44:55+0900"],
    [-10, -9.9,     "",     "FALSE", "nan", "2017-01-01 00:00:00+0900"],
]
>>>
>>> output = writer.dumps()
>>> output
'example_table = pd.DataFrame([\n    [0, 0.1, "hoge", True, 0, "2017-01-01 03:04:05+0900"],\n    [2, -2.23, "foo", False, None, "2017-12-23 45:01:23+0900"],\n    [3, 0, "bar", True, np.inf, "2017-03-03 33:44:55+0900"],\n    [-10, -9.9, "", False, np.nan, "2017-01-01 00:00:00+0900"],\n], columns=["int", "float", "str", "bool", "mix", "time"])\n'
>>> type(output)
<class 'str'>
>>> exec(output)
>>> example_table
   int  float   str   bool  mix                      time
0    0   0.10  hoge   True  0.0  2017-01-01 03:04:05+0900
1    2  -2.23   foo  False  NaN  2017-12-23 45:01:23+0900
2    3   0.00   bar   True  inf  2017-03-03 33:44:55+0900
3  -10  -9.90        False  NaN  2017-01-01 00:00:00+0900
>>> type(example_table)
<class 'pandas.core.frame.DataFrame'>

To avoid needing to exec an arbitrary string, it would nice to be able to have an actual pandas DataFrame returned directly, instead of a string.

This would allow further processing of the DataFrame, for example to plot charts (https://github.com/hugovk/pypistats/pull/74).

Would it be possible to add support for this?


Similarly, it would also be good to be able to get an actual NumPy array from NumpyTableWriter, and not only a string defining one.

Thank you!

thombashi commented 4 years ago

Thank you for your feedback.

Table writer class instances can get a pandas.DataFrame instance by as_dataframe function via tabledata property, also you can get numpy.ndarray from the dataframe:

import pytablewriter

writer = pytablewriter.PandasDataFrameWriter()
writer.table_name = "example_table"
writer.headers = ["int", "float", "str", "bool", "mix", "time"]
writer.value_matrix = [
    [0,   0.1,      "hoge", True,   0,      "2017-01-01 03:04:05+0900"],
    [2,   "-2.23",  "foo",  False,  None,   "2017-12-23 45:01:23+0900"],
    [3,   0,        "bar",  "true",  "inf", "2017-03-03 33:44:55+0900"],
    [-10, -9.9,     "",     "FALSE", "nan", "2017-01-01 00:00:00+0900"],
]
print(type(writer.tabledata.as_dataframe()))
print(type(writer.tabledata.as_dataframe().values))
<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>

Does it satisfy your use case?

hugovk commented 4 years ago

That looks just what I was after, thank you!