Closed TonyCongqianWang closed 1 week ago
Thanks for the alert. We will work on improving header-less support for pandas dataframe. In the meantime, if your dataset has no column names, you can feed it as a single multi-dimensional feature using numpy. Here is an example:
import numpy as np
import ydf
X = np.random.uniform(size=(100,5))
y = np.random.uniform(size=(100)) >= 0.5
model = ydf.RandomForestLearner(label="label").train({"features":X, "label":y})
model.input_features()
Using to_numpy
, you can train YDF models on header-less pandas dataframes by turning them into numpy arrays.
import pandas as pd
X = pd.DataFrame([[1,2,3],[4,5,6]]).to_numpy()
y = pd.DataFrame([1,2]).to_numpy()[:, 0]
model = ydf.RandomForestLearner(label="label").train({"features":X, "label":y})
model.input_features()
Thanks for the quick reply! The solution I used was to rename the features witth
dict = {0 : "y", 1 : "feature_0", 2: "feature_1" .... }
df = df.rename(dict)
which also worked fine
Solved in 0.5.0 release.
I used pandas to import a csv with no header. All headers names are autogenerated and numerical. Using label="0" will result in
ValueError: Column '0' is required but was not found in the data. Available columns: [0, 1, 2 ...
While using label=0 will result inValueError: Constructing the learner requires a non-empty label.
. The problem also occurs when column names are numerical instead of strings