Updates (Our Plan) - Githubissues

zostrow2001 commented 3 years ago

1) For our end-of-semester project, we plan on extending our mid-semester project. We will be focusing on IPOs and the different distribution of characteristics, such as the year they went public, the location of the company, and the sector. However, this time, we will put more emphasis on honing in on specific IPOs, especially their stock prices. Instead of having the most recent stock prices for 10 chosen IPOs, we will instead choose a larger number of stocks of IPOs to follow. In addition, we will also follow the stocks for a longer period of time, hoping to create a visualization of the stock prices over the past month. This data will be obtained by web scraping Yahoo Finance. A challenge for the mid-semester project was that the web scraping had to be hardcoded. However, we hope to avoid hard-coding to make it easier to obtain more data. We also hope to change how we present the distribution of the characteristics of IPOs. We hope to use Spatial Data in order to show the different locations these IPOs have started, and the frequency of the location as well. We will hope to allow for filtering, changing the map so that it only shows certain sectors that are filtered for. The use of scatterplots and the variables used in those scatter plots will also be changed to instead hopefully show a more interesting pattern in IPOs.

2) At the moment, we believe that the addition of a Shiny application will be the best way to represent the different characteristics and stock trend lines of the different IPOs. Possible tabs include the spatial data, then a histogram of other characteristics, then a trend line for the stock prices of the different IPOs. A possible inclusion could be creating a model that attempts to forecast future stock prices of the specific stocks. Nevertheless, we are relatively open to new ways to format our blog such that it represents our analysis of the dataset in the best way possible.

3) Deadlines: 11/6: Gathered Datasets, chosen certain stocks, finalized setup of blog 11/10: Cleaned datasets, data wrangled, finished at least Spatial Data/One of the Visualizations 11/14: Finished Other visualizations, maybe attempting to forecast stock prices/adding new parts as necessary. Decide whether to present or a recording or present live 11/16: Finish setting up presentation, be prepared to present 11/17: Presentation Ready 11/20: Finalize publishing of Blog

katcorr commented 3 years ago

This could be a nice extension of your Shiny project! A few questions/thoughts:

Can you clarify what the location of an IPO is defined as? Where the company has its headquarters? How will you get this data? (Or is it already in the dataset you had from the Shiny project?)
What do you mean by a "histogram of other characteristics"? What characteristics? Remember that histograms are used to display the distribution of a single quantitative variable.
"A possible inclusion could be creating a model that attempts to forecast future stock prices of the specific stocks." What would the predictor variables be? Unless you have some experience with time series modeling, I might recommend dropping this part of the proposal.
"Nevertheless, we are relatively open to new ways to format our blog such that it represents our analysis of the dataset in the best way possible." This is quite vague -- I recommend determining now what specific main deliverables you're working toward so your work can be focused and efficient

Please respond briefly to my questions above by 11/3, and reach out if you have any questions!

Update 1: 10/10

luwilliam20 commented 3 years ago

Hi Professor,

So some clarity on the proposal: the location of an IPO should really be the state that the company was started in. All companies are listed on the New York Stock Exchange, so it makes more sense for us to talk about where the business is headquartered rather than which exchange it is on. That data is already in the dataset we used for our initial project. For your second question, I believe that we were referring to the bar graph of all the different categorical variables that was on the first page of our ShinyApp. The characteristics include sector, industry, revenue, net income, or market cap. Zach and I definitely don't have that kind of experience, but we'll think more about it. If the skills needed to produce this time series graph is way out of our reach, then we'll drop this part of our proposal. And finally, in regards to your last question, we talked over the weekend and decided that the main deliverables we would focus on would be the spatial map of stocks and individual stock trend lines for the companies we are interested in(scraping through yahooFinance).

Let us know if you have any more questions!

zostrow2001 commented 3 years ago

Hi Professor!

For our second update, we have made some progress in regards to our blog project. In fact, we are actually ahead of schedule. We have been able to collect our datasets, which includes the IPO dataset and the data collected from web scraping Yahoo Finance. In addition, we have decided to choose a random collection of 200 stocks to track the stock trends. Since the web scraping is unlikely to work for all 200, it is more likely to be more around 150 stocks that are presented in the final project. We have also finalized our set up of our blog. It will be mostly text explaining our data, the collection process, and the conclusions, then a link to a shiny app with a spatial map and stock trends, and a bar graph of all the different categorical variables. We have also begun the data wrangling process of our project.

katcorr commented 3 years ago

Thorough update. Good plan for moving forward. For the blog, "it will be mostly text explaining our data, the collection process, and ..." -- This sounds good, but don't forget to start with an introduction/motivation to your project topic too (i.e., don't start with explaining the data).

Update 2: 5/5

zostrow2001 commented 3 years ago

Hi Professor!

For our third update, we are still on schedule. We have finished our data wrangling. There are two CSV files which hold the two datasets that we will be using for our visualizations. In order to make creating the time series plot easier to make, however, we might add the pivot_longer function to our random 200 stocks dataset for an easier time plotting two trend lines in one graph. In addition, for now, we have decided not to use the code for the automatic refreshing of the page. Instead, as of now, we will plot based on the past month of stock prices. If we have time later, however, we may choose to include the data.

In regards to our visualization, we are still in the process of making our spatial map and our stock time-series graph. However, the bar graphs for the other characteristics seem to be complete. We need to fix the way we filter data based on sector, however, ensuring that the code is efficient and not repetitive. Overall, we continue to be on course and hope to continue to do so.

katcorr commented 3 years ago

Sounds good!

Update 3: 5/5

stat231-f20 / Blog-MoneyMovers

Updates (Our Plan) #1