stat231-f20 / Blog-Stratton-Oakmont

Repository for PUG Blog Project – Stratton Oakmont
https://stat231-f20.github.io/Blog-Stratton-Oakmont/
0 stars 0 forks source link

Update 1 #1

Open katcorr opened 3 years ago

katcorr commented 3 years ago

(copied from PDF submitted to repo)

  1. Our final project will take a different direction from our precious Shiny App Project. Instead of investigating the stock market, we will now explore the topic of sustainability and environmental concern among the top companies in the United States. Our project will aim to address the following questions: (1) What is the relationship between revenue, revenue growth, and sustainability? (2) What is the relationship between sustainability and geography in the U.S.? (3) Which sectors have the most sustainable companies? For ‘Top Companies,’ we will focus our investigation to companies listed in the Fortune 500, and will scrape company name, yearly profit, and yearly revenue data from the Fortune website. There are several organizations and datasets that provide sustainability and ESG ‘ratings’ for companies in the Fortune 500 list. These ratings and metrics provide insight into how companies make efforts to achieve environmentally-friendly and sustainable practices, and they will be a key component of our sustainability measurements and visualizations. Investors Business Daily provides a list of the 50 best ESG companies, and we plan to scrape this data for our interactive ‘Sustainability Map.’ We would also like to get access to Bloomberg, if possible, as the site provides ESG rates for every publicly traded company. We plan to ask you about this on Tuesday, although Yahoo Finance also provides ESG data for all publicly traded companies, so we may decide to use this resource if we are unable to get access to Bloomberg.

  2. Our blog will include an interactive map that will show the average ESG score for each state in the US. The average score will be determined by averaging the ESG ratings for the companies that are headquartered in each respective state. The blog will also include visuals that display the relationship between ESG ratings and company revenue, revenue growth, sector, size, and age (from when it was founded) for Fortune 500 companies. These visualizations will include a series of point plots to investigate the linear relationships between these variables. Lastly, our Blog will include a written report that summarizes the findings of our data analysis and conveys the implications of sustainability on current and future company revenues and revenue growth.

  3. Checkpoints / Group Deadlines: October 29th: Create Initial Proposal for Blog November 4th: Wrange and tidy datasets for ESG ratings and company attributes (revenue, headquarters, etc.). Make sure that our data is ready to be implemented into visualizations. November 7th: Create Visualizations and Interactive Map in RStudio. November 14th: Write Blog and Implement Visualizations and Interactive Map into Blog page. November 16th: Finalize Presentation, make any necessary changes and/or revisions to our code and Blog.

katcorr commented 3 years ago

@mcooper22 @aristic277

Excellent ideas, team! I love this new direction for your blog post.

This is awesome -- I really look forward to your post!

Update 1: 10/10

aristic277 commented 3 years ago

Update 2:

To answer the above questions: We've decided that we will not be using Bloomberg to collect ESG scores, but will instead implement a scraping bot (built in Selenium software) to collect ESG scores from MSCI. At the moment, we plan for our interactive map to just have zoom and movement capability, so that users can navigate through the map and focus on areas of their choice. We may add more features if we have some extra time at the end of our checkpoint schedule. The location of each company will be defined by company headquarters, so a company will not be able to contribute to multiple locations. Lastly, we may include either a summary table or boxplot, depending on which visual is better able to convey the distribution and relationship among ESG scores and our other variables of interest.

To update our progress: We currently have the two datasets that we intend to use for our blog and visualizations. We have one dataset that contains ESG scores and financial data for the companies in the S&P 100; the ESG scores were collected using a scraping bot (built using Selenium software) that retrieved ESG scores from MSCI and financial data (revenue, yearly revenue growth) from Yahoo Finance. Our other dataset is a ranked list of the 100 most sustainable companies in the world, which includes ESG ratings and headquarters locations -- we intend to use this dataset for our 'ESG Map.' These datasets have all of the variables that we're interested in, but we did run into some issues during data collection and had to make a few adjustments/changes. The biggest change was collecting ESG scores for S&P 100 companies, rather than for companies in the Fortune 500 list. We found that Fortune 500 data (revenues and revenue growth) was more difficult to scrape than S&P company data; there are many Wikipedia pages that contain tables of S&P company data, and these tables are much easier to scrape than the 'tables' from the Fortune 500 site (R did not recognize any tables on the Fortune 500 page, which made scraping very difficult / undoable). Additionally, we realized that some of the companies on the Fortune 500 are not publicly traded, which meant that we could not retrieve their ESG scores from MSCI. Comparatively, all of the companies in the S&P 100 are publicly traded, so this dataset is much more practical for our questions of interest.

Our group is currently on track, and we will be meeting this weekend to revisit our datasets and to begin creating the visualizations for our blog.

katcorr commented 3 years ago

Thorough update, and glad to hear you're on track with your schedule! Your update to using S&P 100 instead of Fortune 500 makes sense given the issues with scraping the Fortune 500 website.

Can you push the scraping and wrangling code to the repo?

Update 2: 5/5

mcooper22 commented 3 years ago

Update 3:

Data Wrangling --> We have concluded all of our dataset wrangling and committed them. We decided to go with the Top 100 companies in the US by revenue because there was the most public access information on them. We had to cut out about twenty of them because they weren't public so we couldn't get ESG scores but I think this is going to be the best data to use! We also finalized our global sustainable companies dataset and have gotten rid of the emoji issues. Visualizations --> We have finished creating our revenue ESG sector and revenue visualizations and are beginning on our US map and World map tomorrow which should be quick given our datasets. After that, we'll be finished! Blog --> Today we began formatting our blog online and we are going to spend the rest of the week implementing our visualizations and writing context for our graphs, maps, and subject. Overall, we're making great timing and will be done with the project as scheduled.

katcorr commented 3 years ago

Great progress!

Update 3: 5/5