weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 259 forks source link

Is it possible to perform Pandas profiling on Weld #460

Closed shashwatwork closed 5 years ago

shashwatwork commented 5 years ago

Hi,

I would like to perform pandas profiling on particular data frame. Whether it can be achieved through weld.

Thanks

sppalkia commented 5 years ago

Hmm, do you mean performance profiling? You can measure how long evaluating a lazy computation takes (e.g., by surrounding a call to evaluate() with time.time()), but unfortunately there isn't any easy way to see how long each operation in a graph takes (the operations will likely be fused into a single loop, so its hard to tease apart where time should be attributed).

On Fri, Jul 12, 2019 at 12:10 AM SHASHWAT TIWARI notifications@github.com wrote:

Hi,

I would like to perform pandas profiling on particular data frame. Whether it can be achieved through weld.

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/weld-project/weld/issues/460?email_source=notifications&email_token=AAKMEYZ7GL5LXKY5LWTJ23LP7AU5TA5CNFSM4ICDKW52YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6ZZCBA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKMEY753A4CODHB66WPFALP7AU5TANCNFSM4ICDKW5Q .

-- Shoumik

shashwatwork commented 5 years ago

No, I'm asking about how to perform data profiling (https://pypi.org/project/pandas-profiling/) on top of DataFrameWeld under grizzly or some other classes.

eg (done on top of pandas DataFrame) p = pandas_profiling.ProfileReport(df,check_correlation=False) p.to_file("profiling01.html")

Here, pandas profiling takes only dataframe as input. Is it possible to give Weld Data frame as input. If yes please let me know how to do.

I had also tried with weld_df.df. basically it will convert weld object to pandas data frame and then work. But time taken is not reduced, it take some what higher than normal profiling.

profiling

Thanks

sppalkia commented 5 years ago

Aha, yes, right now we don't support that feature unfortunately (looking at what it produces, Grizzly and Baloo https://github.com/weld-project/baloo have functions that produce some of these statistics, but not through that interface).

On Mon, Jul 15, 2019 at 3:51 AM SHASHWAT TIWARI notifications@github.com wrote:

No, I'm asking about how to perform data profiling ( https://pypi.org/project/pandas-profiling/) on top of DataFrameWeld under grizzly or some other classes.

eg (done on top of pandas DataFrame)

p = pandas_profiling.ProfileReport(df,check_correlation=False) p.to_file("profiling01.html")

Here, pandas profiling takes only dataframe as input. Is it possible to give Weld Data frame as input. If yes please let me know how to do.

I had also tried with weld_df.df. basically it will convert weld object to pandas data frame and then work. But time taken is not reduced, it take some what higher than normal profiling.

[image: profiling] https://user-images.githubusercontent.com/22785727/61211175-8099d980-a71c-11e9-998d-fe8ae8c514c8.PNG

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/weld-project/weld/issues/460?email_source=notifications&email_token=AAKMEY5XYH64TI5QYN3KJ3TP7RJB5A5CNFSM4ICDKW52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ5KVTI#issuecomment-511355597, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKMEY765YNEHHSYCOKB3MDP7RJB5ANCNFSM4ICDKW5Q .

-- Shoumik

shashwatwork commented 5 years ago

Thanks cool!