Closed Liquidmasl closed 3 months ago
I somehow cant reproduce my success case anymore with real data. I dont understand what changed, but I cant save to parquet anymore after all
raylet.out tail:
this seams to be the issue: https://github.com/modin-project/modin/issues/7361
Modin version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest released version of Modin.
[ ] I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
Writing this a second time because the first time my computer bluescreened (not the first time while troubleshooting this issue)
I am running modin with ray on my laptop, no cluster.
I am loading a file in batches, creating dfs from the batches, and concatenating them together. Then I try to save it using '.to_parquet()'
Depending on how and what I import, I get different results:
Import only modin:
Concatenating the dataframes takes a very long time and needs a lot of ram. Calling
.to_parquet()
fills up my ram and page file and then fails with a killed raylet.Importing ray manually:
concatenating works just as before, trying to save to parquet fills RAM and pagefile (violently) until my computer freezes or bluescreens. The freeze is so extreme the only thing i can still do is cut the power.
Importing ray manually AND calling ray.init()
This initializes 2 ray instances sadly, but: Leads to WAY faster df concatenation, and saving with to_parquet() also works (most of the time) In both operations it also barely fills up my ram (pagefile is chilling) (did manage to bluescreen here once as well though, might be unrelated) It creates a .parquet folder with 20 parquet files in it with about 1.5gigs each
Whats going on here? I just found out because I needed the ray dashboard to debug, so i called init manually, but then the issue was gone.
Also, on a different note: I cant actually load the parquet folder again that i just saved. Instant running OOM. Even though the same data in just one .parquet file loads flawlessly in just a second. whats up with that? Related to https://github.com/modin-project/modin/issues/7020#issuecomment-2263358385 ?
Expected Behavior
I would expect that the default import leads to good results. As it seams something is not adding up.
Error Logs
Installed Versions