xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.11k stars 67 forks source link

BUG: How to read local csv file #763

Closed YishuiLi closed 9 months ago

YishuiLi commented 9 months ago

Describe the bug

Create a cluster with one supervisor and two workers on CentOS7, then connection this cluster in windows 10 and read local csv file with xorbits.pandas.read_csv, but got FileNotFoundError exception. How to read local csv file?

To Reproduce

  1. Python 3.8.18

  2. Xorbits 0.7.1

  3. numpy and pandas a. pandas 2.0.3 b. numpy 1.24.4

  4. Full stack of the error. image image

  5. Minimized code to reproduce the error.

    
    import xorbits
    import xorbits.pandas as pd

xorbits.init('http://192.168.2.130:7005') df = pd.read_csv(r"F:\dataset\iris.csv", header=0, sep=",", encoding="utf-8") xorbits.run(df)

qinxuye commented 9 months ago

FileNotFoundError, can you check if the file exists?

YishuiLi commented 9 months ago

FileNotFoundError, can you check if the file exists?

File exists in the local computer, but not in the cluster. How to read this local csv file? Does every node require file transfers?

qinxuye commented 9 months ago

You need to store the file in the object store, like hdfs or oss etc, ensure the file can be seen in the cluster.

YishuiLi commented 9 months ago

You need to store the file in the object store, like hdfs or oss etc, ensure the file can be seen in the cluster.

Got it. Thanks!