ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

ValueError: feature_names mismatch #177

Open chaokunyang opened 3 years ago

chaokunyang commented 3 years ago

Reproduction code:

import ray
from xgboost_ray import RayDMatrix, RayParams, RayXGBClassifier
from sklearn.datasets import load_breast_cancer
train_x, train_y = load_breast_cancer(return_X_y=True, as_frame=True)
ds = ray.data.from_pandas_refs([ray.put(pd.concat([train_x, train_y], axis=1)),
                                ray.put(pd.concat([train_x, train_y], axis=1))])
ray_params = RayParams(num_actors=2, cpus_per_actor=1)
clf = RayXGBClassifier(
    ray_params=ray_params,
    random_state=42,
    use_label_encoder=False,
    num_class=2,
)
clf.fit(RayDMatrix(ds, "target"), y=None, ray_params=ray_params)
print(f"classfier {clf}")
pred = clf.predict(RayDMatrix(ds, "target"))
print("predicted values: ", pred)
pred = clf.predict(train_x)
print("predicted values: ", pred)

Result:

            for i, value in enumerate(values):
                if isinstance(value, RayError):
                    if isinstance(value, ray.exceptions.ObjectLostError):
                        worker.core_worker.dump_object_store_memory_usage()
                    if isinstance(value, RayTaskError):
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(ValueError): ray::_RemoteRayXGBoostActor.predict() (pid=37555, ip=127.0.0.1, repr=<xgboost_ray.main._RemoteRayXGBoostActor object at 0x7fda833e9fd0>)
E                         File "/Users/chaokunyang/opt/anaconda3/envs/mars-py3.8-dev/lib/python3.8/site-packages/xgboost_ray/main.py", line 673, in predict
E                           predictions = model.predict(local_data, **kwargs)
E                         File "/Users/chaokunyang/opt/anaconda3/envs/mars-py3.8-dev/lib/python3.8/site-packages/xgboost/core.py", line 1485, in predict
E                           self._validate_features(data)
E                         File "/Users/chaokunyang/opt/anaconda3/envs/mars-py3.8-dev/lib/python3.8/site-packages/xgboost/core.py", line 2060, in _validate_features
E                           raise ValueError(msg.format(self.feature_names,
E                       ValueError: feature_names mismatch: ['area error', 'compactness error', 'concave points error', 'concavity error', 'fractal dimension error', 'mean area', 'mean compactness', 'mean concave points', 'mean concavity', 'mean fractal dimension', 'mean perimeter', 'mean radius', 'mean smoothness', 'mean symmetry', 'mean texture', 'perimeter error', 'radius error', 'smoothness error', 'symmetry error', 'texture error', 'worst area', 'worst compactness', 'worst concave points', 'worst concavity', 'worst fractal dimension', 'worst perimeter', 'worst radius', 'worst smoothness', 'worst symmetry', 'worst texture'] ['mean radius', 'mean texture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean compactness', 'mean concavity', 'mean concave points', 'mean symmetry', 'mean fractal dimension', 'radius error', 'texture error', 'perimeter error', 'area error', 'smoothness error', 'compactness error', 'concavity error', 'concave points error', 'symmetry error', 'fractal dimension error', 'worst radius', 'worst texture', 'worst perimeter', 'worst area', 'worst smoothness', 'worst compactness', 'worst concavity', 'worst concave points', 'worst symmetry', 'worst fractal dimension']
krfricke commented 2 years ago

Hi @chaokunyang, I could not reproduce this error (it passes for me), which version of Ray, XGboost-Ray, and XGBoost are you using?