h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.81k stars 155 forks source link

How to transform List of Dict to Datatable #2507

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi, so far I wasn't successful in finding a way to use datatable instead of pandas to transform a list of dict to a datatable/frame. The pandas code is the following:

data = [ {'x':'a', 'y':'a'}, {'x':'b', 'y':'b'} ] df = pd.DataFrame(data)

How can this be achieved with datatable?

Thank you very much!

st-pasha commented 4 years ago

It works the same way in datatable:

>>> data = [ {'x':'a', 'y':'a'}, {'x':'b', 'y':'b'} ]
>>> dt.Frame(data)
   | x   y 
-- + --  --
 0 | a   a 
 1 | b   b 

[2 rows x 2 columns]

Or maybe I misunderstood your question?

ghost commented 4 years ago

Thank you very much for your fast reply. Sadly this is not exactly what I am looking for.

data = [{'x': 'a', 'y': 'a'}, {'x': 'b', 'y': 'b'}] df_pd = pd.DataFrame(data) df_dt = dt.Frame(data)

I executed the three lines of code with the following result:

Bildschirmfoto 2020-06-28 um 21 10 00

Only the pandas solution leads to a 'real' dataframe.

Bildschirmfoto 2020-06-28 um 21 11 38

And I was wondering if I could accomplish the same with datatable. Thank you!

aschmu commented 4 years ago

you have to convert it to a data frame yourself !

ghost commented 4 years ago

So that means it is not possible to achieve this with datatable? At least not in python? I am more practiced with data.table for R and would rather use datatable than pandas because of the speed gains. But if I can't view the tables like pandas dataframes, I might have to stick with pandas.

oleksiyskononenko commented 4 years ago

@Peter-Pasta In your example the line df_dt = dt.Frame(data) did create a new datatable frame successfully, so you can start using df_dt right away. The only problem here is that in your environment the data preview for datatable doesn't work properly.

We have several scenarios for the data preview: text mode for Python console, html mode for Jupyter notebook, etc. In your case it seems that datatable uses the text mode, though your environment is not a normal console. What is the software you are using on the posted screenshots?

ghost commented 4 years ago

@oleksiyskononenko Thank you very much for your reply. I am starting to understand. I am using PyCharm Professional (Version 2020.1) in Scientific Mode.

This is a screenshot from the JetBrains website:

Bildschirmfoto 2020-06-29 um 09 46 56

Being able to view my dataframes is incredibly helpful for me.

st-pasha commented 4 years ago

@Peter-Pasta As you correctly noted, this is functionality provided by PyCharm and their SciView plugin. They support pandas DataFrames and numpy arrays only. Perhaps they would be interested in extending their support to other data frames too? I would recommend that you raise this issue on their support forum.

We would be happy to cooperate and take any steps forward in order to help with this feature.

ghost commented 4 years ago

@st-pasha Thank you very much. I really appreciate your time and hard work.

ghost commented 3 years ago

Hi, it's been a while but I wanted to let you know that this issue is now resolved. Datatables can now be viewed in PyCharm correctly.

st-pasha commented 3 years ago

Thanks for letting us know, Peter.

ghost commented 3 years ago

@Peter-Pasta As you correctly noted, this is functionality provided by PyCharm and their SciView plugin. They support pandas DataFrames and numpy arrays only. Perhaps they would be interested in extending their support to other data frames too? I would recommend that you raise this issue on their support forum.

We would be happy to cooperate and take any steps forward in order to help with this feature.

Hi once again. As I told you last time its possible to view datatable Frames in the PyCharm IDE. However there is no support from the SciView feature which provides additional functionality for Pandas and Numpy. I would like to make you aware that recently this particular issue was opened on their Support Forum: https://youtrack.jetbrains.com/issue/PY-47475. I thought you might be interested in checking it out.