Statistical Computing Projects
This repository contains a collection of data analysis and machine learning projects completed as part of various statistical computing courses. Each project demonstrates different aspects of data manipulation, analysis, visualization, and modeling using various tools and programming languages.
Projects
1. Pairs Trading Strategy
- Implemented a pairs trading strategy for stock market analysis
- Analyzed stock price data for Visa and Mastercard from 2010-2020
- Calculated optimal trading positions and evaluated strategy performance
- Extended analysis to other stock pairs with varying correlations
2. Global Demographics Analysis
- Analyzed global demographic data from the 2014 CIA Factbook
- Explored trends in population, mortality, GDP, and language across countries
- Created visualizations including world maps of infant mortality rates
- Implemented k-means clustering for country classification
- Investigated linguistic trends related to colonialism
3. Spam Email Classification
- Developed a spam email detection algorithm using Naive Bayes classification
- Processed and analyzed a corpus of over 9,000 spam and non-spam emails
- Implemented text processing techniques for feature extraction
- Evaluated classifier performance using various threshold values
- Extended analysis to include subject lines and sender domains
4. MLB Baseball Statistics Analysis
- Analyzed MLB salary data from 1985 to 2016 using SQL and R
- Investigated trends in player salaries over time and across leagues
- Created visualizations of salary distributions and team spending
- Explored relationships between team salaries and World Series success
- Analyzed patterns in home runs and All-Star player representation
Tools and Technologies
- R
- SQL
- Data visualization libraries (ggplot2, etc.)
- Statistical modeling and machine learning techniques
- Text processing and natural language processing
Each project folder contains detailed code, analysis, and findings. Please refer to individual project files for more specific information on methodologies and results.