Open isaiahoh opened 2 weeks ago
with open(filename) as my_file: with open('filename.csv') as my_file:
with open('marriage_data.csv') as file: reader = csv.reader(file)
- However, for this project we will use Matplotlib.
import pandas as pd import matplotlib.pyplot as plt
- Load CSV file in: You can load your CSV file using Pandas' read_csv() method.
Load the CSV file
data = pd.read_csv('filename.csv')
Display the first few rows of the dataframe
print(filename.head())
* Visualise it with some basic methods:
Plot the data
plt.plot(data['Date'], data['Sales'])
Add labels and title
plt.xlabel('Date') plt.ylabel('Sales') plt.title('Sales Over Time')
Show the plot
plt.xticks(rotation=45) # Rotate date labels for better readability plt.show()
Customization:
Create a plot with customizations
plt.plot(data['Date'], data['Sales'], color='red', marker='o', linestyle='--')
Adding grid lines
plt.grid(True)
Customise axes
plt.xlabel('Date', fontsize=12) plt.ylabel('Sales', fontsize=12) plt.title('Sales Over Time with Custom Style', fontsize=15)
Display the plot
plt.xticks(rotation=45) plt.show()
Save the plot as image: If you want to save the plot as an image, use the savefig() function.
plt.plot(data['Date'], data['Sales']) plt.xlabel('Date') plt.ylabel('Sales') plt.title('Sales Over Time')
Save the plot as a PNG file
plt.savefig('sales_plot.png')
With the given example CSV file, we will explore how to “select columns” and “predict the price of a house based on the data!”
House_Size,Price,Bedrooms 1000,200000,3 1500,300000,4 2000,400000,3 2500,500000,4 3000,600000,5
selected_columns = data[['House_Size', 'Price']]
- We use Matplotlib’s plt.plot() method to create a plot of House_Size vs Price. plt.plot(selected_columns['House_Size'], selected_columns['Price'], marker='o', linestyle='-', color='blue')
- The plt.show() command displays the figure. plt.show()
- Now, try to put them all together. import pandas as pd import matplotlib.pyplot as plt
Step 1: Load the CSV file into a Pandas DataFrame
data = pd.read_csv('house_data.csv')
Step 2: Select the columns 'House_Size' and 'Price' for plotting
selected_columns = data[['House_Size', 'Price']]
Print selected columns (optional, to verify the selection)
print(selected_columns)
Step 3: Plot the data using Matplotlib
plt.plot(selected_columns['House_Size'], selected_columns['Price'], marker='o', linestyle='-', color='blue')
Add labels and title
plt.xlabel('House Size (sqft)') plt.ylabel('Price ($)') plt.title('House Size vs Price')
Display the plot
plt.show()
- With the following code, this is what you should get: Selected Columns:
House_Size (sqft) Price ($)
0 1000 200000 1 1500 300000 2 2000 400000 3 2500 500000 4 3000 600000
Here is a step breakdown:
import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression import numpy as np
Step 1: Load the CSV file into a Pandas DataFrame
data = pd.read_csv('house_data.csv')
Step 2: Extract the 'House_Size' and 'Price' columns
house_size = data['House_Size (sqft)'].values.reshape(-1, 1) # Reshaping for sklearn price = data['Price ($)'].values
Step 3: Visualize the data using a scatter plot
plt.scatter(house_size, price, color='blue') plt.xlabel('House Size (sqft)') plt.ylabel('Price ($)') plt.title('House Size vs Price') plt.show()
Step 4: Fit a linear regression model to predict the price of a house based on its size
model = LinearRegression() model.fit(house_size, price)
Predict the price of a house for a given size, e.g., 2200 sqft
house_size_to_predict = np.array([[2200]]) predicted_price = model.predict(house_size_to_predict)
print(f'The predicted price for a house of size 2200 sqft is: ${predicted_price[0]:,.2f}')
- This is what you should get from the following code: The predicted price for a house of size 2200 sqft is: $440,000.00