Closed richardliaw closed 3 years ago
This issue might be independent of this library. Here's a smaller repro -
import ray
import ray.util
ray.util.connect("<service_ip>:50051")
import modin.pandas as pd
colnames = ["label"] + ["feature-%02d" % i for i in range(1, 29)]
df = pd.read_csv("s3://<>/HIGGS.csv", names=colnames)
@ray.remote
def add_rows(modin_df):
for i, row in (df_input.iterrows()):
modin_df.at[i,"sum"] = row['feature-01'] + row['feature-02'] + row['feature-03']
return df_input
df_2 = ray.get(add_rows.remote(df))
Should this be an issue on the main Ray Repo?
Hm yeah, it seems to be related to Ray client and Modin, not necessarily to xgboost_ray (though I could imagine some probleme here, too). Can you open an issue at the main Ray repo and link to this one here? Let's keep this issue here open, too, just to track general compatibility with Modin.
Sure, created https://github.com/ray-project/ray/issues/14857 to track it.
Hey @krfricke and @richardliaw, I'm running into issues when passing the dataframe using ray client and modin. This is what the code looks like -
It works when I execute this chunk as a remote function. The error is related to modin not being able to detect ray client runtime when called by ray-xgboost. Modin works fine otherwise when loading the csv itself.
This is how the stacktrace looks like -
_Originally posted by @Bhavya6187 in https://github.com/ray-project/xgboost_ray/issues/32#issuecomment-804430451_