uni-courses / snncompare

Runs networkx graphs representing spiking neural networks of LIF-neurons on lava-nc or networkx.
GNU Affero General Public License v3.0
2 stars 0 forks source link

Compute the statistical significance of the adaptation mechansim results. #206

Closed a-t-0 closed 1 year ago

a-t-0 commented 1 year ago

Since the results are binary, a logistic regression is used to compute the p-value of the adaptation mechanisms.

# Source: https://stats.stackexchange.com/questions/5935/anova-on-binomial-data
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Data
no_fertilizer = [0, 0, 0, 1, 0, 0,0,0,0,0,0,1,1,1,1, 0]  # Scores for no fertilizer
fertilizer_2kg = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,1,1,1,1, 0]  # Scores for 2kg fertilizer
fertilizer_4kg = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,1,1,1,1, 0]  # Scores for 4kg fertilizer
fertilizer_6kg = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,1,1,1,1, 0]  # Scores for 6kg fertilizer

# Combine the scores and create the target variable
scores = np.concatenate((no_fertilizer, fertilizer_2kg, fertilizer_4kg, fertilizer_6kg))
fertilizer_amounts = [0] * len(no_fertilizer) + [2] * len(fertilizer_2kg) + [4] * len(fertilizer_4kg) + [6] * len(fertilizer_6kg)

# Create a pandas DataFrame
data = pd.DataFrame({'Scores': scores, 'Fertilizer': fertilizer_amounts})

# Add a constant column for the intercept
data = sm.add_constant(data)

# Perform logistic regression
logit = sm.Logit(data['Scores'], data[['const', 'Fertilizer']])
result = logit.fit()

# Print the results
print(result.summary())

For the sparse redundancy, where more redundancy is not necessarily better, one can heap all the redundancy level results on a single pile and then run the logistical regression results:

# Source: https://stats.stackexchange.com/questions/5935/anova-on-binomial-data
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Data
no_fertilizer = [0, 0, 0, 1, 0, 0,0,0,0,0,0,1,1,1,1, 0]  # Scores for no fertilizer
fertilizer_2kg = [1, 1, 1, 1, 0, 0, 0, 0,1,1,1,1, 0, 1, 1, 1, 1, 1, 1,  0, 0, 0, 0,1,1,1,1, 0]  # Scores for 2kg fertilizer
# Combine the scores and create the target variable
scores = np.concatenate((no_fertilizer, fertilizer_2kg))
fertilizer_amounts = [0] * len(no_fertilizer) + [2] * len(fertilizer_2kg) 

# Create a pandas DataFrame
data = pd.DataFrame({'Scores': scores, 'Fertilizer': fertilizer_amounts})

# Add a constant column for the intercept
data = sm.add_constant(data)

# Perform logistic regression
logit = sm.Logit(data['Scores'], data[['const', 'Fertilizer']])
result = logit.fit()

# Print the results
print(result.summary())
a-t-0 commented 1 year ago

Assumptions logistical regression:

Include assumption checks:

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

# Data
no_fertilizer = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0]  # Scores for no fertilizer
fertilizer_2kg = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0]  # Scores for 2kg fertilizer
fertilizer_4kg = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0]  # Scores for 4kg fertilizer
fertilizer_6kg = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0]  # Scores for 6kg fertilizer

# Combine the scores and create the target variable
scores = np.concatenate((no_fertilizer, fertilizer_2kg, fertilizer_4kg, fertilizer_6kg))
fertilizer_amounts = [0] * len(no_fertilizer) + [2] * len(fertilizer_2kg) + [4] * len(fertilizer_4kg) + [6] * len(fertilizer_6kg)

# Create a pandas DataFrame
data = pd.DataFrame({'Scores': scores, 'Fertilizer': fertilizer_amounts})

# Add a constant column for the intercept
data = sm.add_constant(data)

# Perform logistic regression
logit = sm.Logit(data['Scores'], data[['const', 'Fertilizer']])
result = logit.fit()

# Print the results
print(result.summary())

# Residual Analysis
# Residual Analysis: A scatter plot of the residuals against the predicted 
# values is created to check for any patterns or trends. The red dashed line
# represents the ideal situation where residuals are randomly distributed 
# around zero.
predicted_values = result.predict()
residuals = data['Scores'] - predicted_values

# Plotting Residuals
sns.scatterplot(x=predicted_values, y=residuals)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()
input(
    "Were the residuals random? "
    +"If yes, then good, if not, then bad, they should be random")

# Checking for multicollinearity
# Multicollinearity: The correlation matrix between the dependent variable and
# the independent variable is computed to assess the presence of 
# multicollinearity. A correlation close to 1 or -1 indicates a high 
# correlation between the variables.
correlation_matrix = data[['Scores', 'Fertilizer']].corr()
print('\nCorrelation Matrix:')
print(correlation_matrix)
input("Are the extremeties of the colums close to 1 or -1?"
    +"If yes, then the multicollinearity assumption is violated.")

# Checking for outliers
# Outlier Analysis: A box plot of the residuals against the fertilizer amounts
# is created.
sns.boxplot(x=data['Fertilizer'], y=residuals)
plt.xlabel('Fertilizer')
plt.ylabel('Residuals')
plt.title('Residuals vs. Fertilizer')
plt.show()
a-t-0 commented 1 year ago

TODO: load actual duration.

Some graphs will be loaded from file, and the radiation requires the actual_duration key in the print(unradiated_graph.network.graph.graph) dict when you run:

python -m src.snncompare -e qt0 -j1 -j2 -j4 -j5

So load that actual sim duration from file when you load those graphs.

a-t-0 commented 1 year ago