Closed Sadiamitu closed 4 months ago
Hey @Sadiamitu, thanks for reporting. This problem is fixed in our latest version available in Anaconda (5.5.1). Please update by closing Spyder, opening the Anaconda Prompt and running there the commands mentioned in our documentation.
Description
What steps will reproduce the problem?
"Spyder has encountered an internal problem!" this dialouge box appears whenever i try to run the following code. I have tried it in another computer as well but it also shows the same dialogue box. The code is running but the figures are not generating in the side window. I am not sure which is ccausing the problem.
import pandas as pd import numpy as np from scipy.signal import savgol_filter from sklearn.cross_decomposition import PLSRegression from sklearn.metrics import mean_squared_error, r2_score import matplotlib.pyplot as plt
Load the data
train_df = pd.read_csv('MIR_train.csv') test_df = pd.read_csv('MIR_test.csv') full_df = pd.concat([train_df, test_df])
Specify the properties and those to log-transform
prop = ['eoc_tot_c', 'n_tot_ncs', 'ph_h2o', 'clay_tot_psa', 'silt_tot_psa', 'sand_tot_psa', 'caco3', 'cecd_nh4', 'ca_nh4d', 'mg_nh4d', 'k_nh4d', 'p_mehlich3'] prop_to_log_transform = ['eoc_tot_c', 'n_tot_ncs', 'p_mehlich3', 'cecd_nh4', 'caco3', 'ca_nh4d', 'mg_nh4d', 'k_nh4d']
Initial log transformation
for property in prop_to_log_transform: train_df[property] = np.log1p(train_df[property]) test_df[property] = np.log1p(test_df[property])
Apply Savitzky-Golay filter to the spectra data
spectra_columns = train_df.columns[55:] # Adjust the index appropriately X_train_filtered = savgol_filter(train_df[spectra_columns], window_length=11, polyorder=2, axis=0) X_test_filtered = savgol_filter(test_df[spectra_columns], window_length=11, polyorder=2, axis=0)
X_train_filtered = pd.DataFrame(X_train_filtered, columns=spectra_columns) X_test_filtered = pd.DataFrame(X_test_filtered, columns=spectra_columns)
Define the evaluation function
def perform_plsr_and_evaluate(X_train, y_train, X_test, y_test, property_name, log_transformed=False): pls = PLSRegression(n_components=10) pls.fit(X_train, y_train) y_pred = pls.predict(X_test).flatten()
from sklearn.model_selection import cross_val_predict def pls(X_train, Y_train, X_test, n_components=50): rmse_scores = [] for n_comp in range(1, n_components + 1): my_plsr = PLSRegression(n_components=n_comp, scale=True) y_pred = cross_val_predict(my_plsr, X_train, Y_train, cv=10) rmse = np.sqrt(mean_squared_error(Y_train, y_pred)) rmse_scores.append(rmse) best_ncomp = rmse_scores.index(min(rmse_scores)) + 1
Process each property individually, ensuring data cleaning is specific to each property
Process each property individually, ensuring data cleaning is specific to each property
results = {} stats = {} for property in prop: print(f"Processing: {property}") current_train_df = train_df.dropna(subset=[property, spectra_columns]) current_test_df = test_df.dropna(subset=[property, spectra_columns]) full_current_df = full_df.dropna(subset=[property, *spectra_columns])
Summary tables
results_df = pd.DataFrame(results).T stats_df = pd.DataFrame(stats).T
print("Performance Metrics:") print(results_df) print("\nData Stats:") print(stats_df)
Creating a DataFrame to summarize the results
summary_table = pd.DataFrame(results).T # Transpose to have properties as rows summary_table.index.name = 'Property' summary_table.columns = ['R²', 'RMSE', 'Bias', 'RPD']
print(summary_table)
Traceback
Versions
Dependencies