import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
import joblib
# Define CSV file paths
csv_files = []
# Read and concatenate CSV files
df = pd.concat([pd.read_csv(file) for file in csv_files])
# Preprocess data
# Fill missing values
df.fillna(df.mean(), inplace=True)
# Standardize numerical variables
scaler = StandardScaler()
df[df.columns] = scaler.fit_transform(df[df.columns])
# Define features and target
X = df.drop(['trade_decision'], axis=1)
y = df['trade_decision']
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define model
model = RandomForestClassifier(n_estimators=100, random_state=42)
# Train model
model.fit(X_train, y_train)
# Create directory if not exists
os.makedirs('/tmp/codecup2024-demo-python/', exist_ok=True)
# Save training data to CSV
X_train.to_csv('/tmp/codecup2024-demo-python/X_train.csv', index=False)
y_train.to_csv('/tmp/codecup2024-demo-python/y_train.csv', index=False) # y_train needs to be converted to DataFrame
# Save model
joblib.dump(model, '/tmp/codecup2024-demo-python/model.pkl')
你的代码逻辑是读取CSV文件、预处理数据、训练模型,并尝试保存训练数据为CSV文件。你的问题是无法将训练数据保存为CSV文件。我们来检查代码并优化它。
以下是你现有代码的修订版本,包含一些改进和调试建议:
主要修正和优化:
/tmp/codecup2024-demo-python/
目录存在:使用os.makedirs
来确保目录存在。y_train
转换为DataFrame:y_train
是一个Series,需要转换为DataFrame才能使用to_csv
方法保存。/tmp/codecup2024-demo-python/
在你的运行环境中是可写的。如果使用Windows,可以改为本地目录如C:/tmp/codecup2024-demo-python/
。希望这些修正和优化能帮助你解决问题。如果你在某些特定环境中运行代码,可能需要调整路径或权限设置。