princyi / password-protected-zip-file-

This Python script creates a password-protected ZIP file using the pyzipper library. It allows you to specify the files to include in the ZIP and set a password for encryption. The resulting ZIP file requires the provided password to access its contents, providing an additional layer of security.
2 stars 0 forks source link

Exercise: Supervised vs Unsupervised #6

Open princyi opened 2 months ago

princyi commented 2 months ago

I will complete two machine learning tasks:

Steps to work with aws gateway

Click Launch Cloud Gateway in the bottom of the navigation menu From the AWS console, search for SageMaker, in the left side menu click Notebooks, then click Notebook instances Click Create notebook instance When the instance is ready, click Open Jupyter When in the notebook, click New and select conda_python3 Copy the code below on this page and paste it into a notebook cell, then run the cell Supervised learning Unsupervised learning Delete the notebook instance after reviewing the solution on the next page. You'll generate synthetic data for both exercises and execute them in an AWS SageMaker Jupyter Notebook instance.

Part 1: Predicting Building Energy Efficiency (Supervised Learning) Scenario - You are working for an architecture firm, and your task is to build a model that predicts the energy efficiency rating of buildings based on features like wall area, roof area, overall height, etc.

Supervised Learning Code: To predict the energy efficiency of buildings. Code in python

Import necessary libraries

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import warnings from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error

warnings.filterwarnings('ignore')

Generate synthetic dataset for building features and energy efficiency ratings

np.random.seed(0) data_size = 500 data = { 'WallArea': np.random.randint(200, 400, data_size), 'RoofArea': np.random.randint(100, 200, data_size), 'OverallHeight': np.random.uniform(3, 10, data_size), 'GlazingArea': np.random.uniform(0, 1, data_size), 'EnergyEfficiency': np.random.uniform(10, 50, data_size) # Energy efficiency rating } df = pd.DataFrame(data)

Data preprocessing

X = df.drop('EnergyEfficiency', axis=1) y = df['EnergyEfficiency']

Visualize the relationships between features and the target variable (Energy Efficiency)

sns.pairplot(df, x_vars=['WallArea', 'RoofArea', 'OverallHeight', 'GlazingArea'], y_vars='EnergyEfficiency', height=4, aspect=1, kind='scatter') plt.show()

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train a Random Forest model

model = RandomForestRegressor() model.fit(X_train, y_train)

Predict and evaluate

predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error: {mse}")

Plot the True values vs Predicted values

plt.figure(figsize=(10, 6)) plt.scatter(y_test, predictions) plt.xlabel("True Values") plt.ylabel("Predictions") plt.title("True Values vs Predicted Values") plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--') plt.show()

End of this code

Part 2: Vehicle Clustering (Unsupervised Learning) Scenario - You are working for an automotive company, and your task is to cluster vehicles into groups based on their features such as weight, engine size, and horsepower.

Unsupervised Learning Code: To cluster vehicles based on their specifications.

Code started in python

Import necessary libraries

import pandas as pd import numpy as np import matplotlib.pyplot as plt import warnings from sklearn.cluster import KMeans

warnings.filterwarnings('ignore')

Generate synthetic dataset for vehicles

np.random.seed(0) data_size = 300 data = { 'Weight': np.random.randint(1000, 3000, data_size), 'EngineSize': np.random.uniform(1.0, 4.0, data_size), 'Horsepower': np.random.randint(50, 300, data_size) } df = pd.DataFrame(data)

No labels are needed for unsupervised learning

X = df

Perform KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=42) kmeans.fit(X)

Plotting the clusters

plt.scatter(df['Weight'], df['Horsepower'], c=kmeans.labels_) plt.xlabel('Weight') plt.ylabel('Horsepower') plt.title('Vehicle Clusters') plt.show()

End of code