Task 3 - Githubissues

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<p style=\"text-align:center\">\n", " <a href=\"https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork865-2023-01-01\">\n", " <img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png\" width=\"200\" alt=\"Skills Network Logo\" />\n", " \n", "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Space X Falcon 9 First Stage Landing Prediction\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Web scraping Falcon 9 and Falcon Heavy Launches Records from Wikipedia\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estimated time needed: 40 minutes\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, you will be performing web scraping to collect Falcon 9 historical launch records from a Wikipedia page titled List of Falcon 9 and Falcon Heavy launches\n", "\n", "https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Falcon 9 first stage will land successfully\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Several examples of an unsuccessful landing are shown here:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More specifically, the launch records are stored in a HTML table shown below:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ## Objectives\n", "Web scrap Falcon 9 launch records with BeautifulSoup: \n", "- Extract a Falcon 9 launch records HTML table from Wikipedia\n", "- Parse the table and convert it into a Pandas data frame\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's import required packages for this lab\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: beautifulsoup4 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (4.11.1)\n", "Requirement already satisfied: soupsieve>1.2 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (from beautifulsoup4) (2.3.2.post1)\n", "Requirement already satisfied: requests in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (2.29.0)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (from requests) (3.1.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (from requests) (3.4)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (from requests) (1.26.15)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.7/site-packages (from requests) (2023.5.7)\n" ] } ], "source": [ "!pip3 install beautifulsoup4\n", "!pip3 install requests" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [], "source": [ "import sys\n", "\n", "import requests\n", "from bs4 import BeautifulSoup\n", "import re\n", "import unicodedata\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we will provide some helper functions for you to process web scraped HTML table\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "def date_time(table_cells):\n", " \"\"\"\n", " This function returns the data and time from the HTML table cell\n", " Input: the element of a table data cell extracts extra row\n", " \"\"\"\n", " return [data_time.strip() for data_time in list(table_cells.strings)][0:2]\n", "\n", "def booster_version(table_cells):\n", " \"\"\"\n", " This function returns the booster version from the HTML table cell \n", " Input: the element of a table data cell extracts extra row\n", " \"\"\"\n", " out=''.join([booster_version for i,booster_version in enumerate( table_cells.strings) if i%2==0][0:-1])\n", " return out\n", "\n", "def landing_status(table_cells):\n", " \"\"\"\n", " This function returns the landing status from the HTML table cell \n", " Input: the element of a table data cell extracts extra row\n", " \"\"\"\n", " out=[i for i in table_cells.strings][0]\n", " return out\n", "\n", "\n", "def get_mass(table_cells):\n", " mass=unicodedata.normalize(\"NFKD\", table_cells.text).strip()\n", " if mass:\n", " mass.find(\"kg\")\n", " new_mass=mass[0:mass.find(\"kg\")+2]\n", " else:\n", " new_mass=0\n", " return new_mass\n", "\n", "\n", "def extract_column_from_header(row):\n", " \"\"\"\n", " This function returns the landing status from the HTML table cell \n", " Input: the element of a table data cell extracts extra row\n", " \"\"\"\n", " if (row.br):\n", " row.br.extract()\n", " if row.a:\n", " row.a.extract()\n", " if row.sup:\n", " row.sup.extract()\n", " \n", " colunm_name = ' '.join(row.contents)\n", " \n", " # Filter the digit and empty names\n", " if not(colunm_name.strip().isdigit()):\n", " colunm_name = colunm_name.strip()\n", " return colunm_name \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To keep the lab tasks consistent, you will be asked to scrape the data from a snapshot of the List of Falcon 9 and Falcon Heavy launches Wikipage updated on\n", "9th June 2021\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [], "source": [ "static_url = \"https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, request the HTML page from the above URL and get a response object\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TASK 1: Request the Falcon9 Launch Wiki page from its URL\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922\n" ] } ], "source": [ "# use requests.get() method with the provided static_url\n", "response = requests.get(static_url)\n", "\n", "response.status_code\n", "\n", "print(response.url)\n", "\n", "# assign the response to a object\n", "object = response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a BeautifulSoup object from the HTML response\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Use BeautifulSoup() to create a BeautifulSoup object from a response text content\n", "# soup = BeautifulSoup(response.text,\"https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the page title to verify if the BeautifulSoup object was created properly \n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "List of Falcon 9 and Falcon Heavy launches - Wikipedia" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use soup.title attribute\n", "soup=BeautifulSoup(response.text,\"html\")\n", "# Use soup.title attribute\n", "soup.title" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TASK 2: Extract all column/variable names from the HTML table header\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to collect all relevant column names from the HTML table header\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to find all tables on the wiki page first. If you need to refresh your memory about BeautifulSoup, please check the external reference link towards the end of this lab\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Use the find_all function in the BeautifulSoup object, with element type table\n", "# Assign the result to a list called html_tables\n", "html_tables = soup.find_all('table')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from the third table is our target table contains the actual launch records.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<table class=\"wikitable plainrowheaders collapsible\" style=\"width: 100%;\">\n", "\n", "<th scope=\"col\">Flight No.\n", "\n", "<th scope=\"col\">Date and
time (<a href=\"/wiki/Coordinated_Universal_Time\" title=\"Coordinated Universal Time\">UTC)\n", "\n", "<th scope=\"col\"><a href=\"/wiki/List_of_Falcon_9_first-stage_boosters\" title=\"List of Falcon 9 first-stage boosters\">Version,
Booster <sup class=\"reference\" id=\"cite_ref-booster_11-0\"><a href=\"#cite_note-booster-11\">[b]\n", "\n", "<th scope=\"col\">Launch site\n", "\n", "<th scope=\"col\">Payload<sup class=\"reference\" id=\"cite_ref-Dragon_12-0\"><a href=\"#cite_note-Dragon-12\">[c]\n", "\n", "<th scope=\"col\">Payload mass\n", "\n", "<th scope=\"col\">Orbit\n", "\n", "<th scope=\"col\">Customer\n", "\n", "<th scope=\"col\">Launch
outcome\n", "\n", "<th scope=\"col\"><a href=\"/wiki/Falcon_9_first-stage_landing_tests\" title=\"Falcon 9 first-stage landing tests\">Booster
landing\n", "\n", "\n", "<th rowspan=\"2\" scope=\"row\" style=\"text-align:center;\">1\n", "\n", "4 June 2010,
18:45\n", "\n", "<a href=\"/wiki/Falcon_9_v1.0\" title=\"Falcon 9 v1.0\">F9 v1.0<sup class=\"reference\" id=\"cite_ref-MuskMay2012_13-0\"><a href=\"#cite_note-MuskMay2012-13\">[7]
B0003.1<sup class=\"reference\" id=\"cite_ref-block_numbers_14-0\"><a href=\"#cite_note-block_numbers-14\">[8]\n", "\n", "<a href=\"/wiki/Cape_Canaveral_Space_Force_Station\" title=\"Cape Canaveral Space Force Station\">CCAFS,
<a href=\"/wiki/Cape_Canaveral_Space_Launch_Complex_40\" title=\"Cape Canaveral Space Launch Complex 40\">SLC-40\n", "\n", "<a href=\"/wiki/Dragon_Spacecraft_Qualification_Unit\" title=\"Dragon Spacecraft Qualification Unit\">Dragon Spacecraft Qualification Unit\n", "\n", "\n", "\n", "<a href=\"/wiki/Low_Earth_orbit\" title=\"Low Earth orbit\">LEO\n", "\n", "<a href=\"/wiki/SpaceX\" title=\"SpaceX\">SpaceX\n", "\n", "<td class=\"table-success\" style=\"background: #9EFF9E; vertical-align: middle; text-align: center;\">Success\n", "\n", "<td class=\"table-failure\" style=\"background: #FFC7C7; vertical-align: middle; text-align: center;\">Failure<sup class=\"reference\" id=\"cite_ref-ns20110930_15-0\"><a href=\"#cite_note-ns20110930-15\">[9]<sup class=\"reference\" id=\"cite_ref-16\"><a href=\"#cite_note-16\">[10]
(parachute)\n", "\n", "\n", "<td colspan=\"9\">First flight of Falcon 9 v1.0.<sup class=\"reference\" id=\"cite_ref-sfn20100604_17-0\"><a href=\"#cite_note-sfn20100604-17\">[11] Used a boilerplate version of Dragon capsule which was not designed to separate from the second stage.(<a href=\"#First_flight_of_Falcon_9\">more details below) Attempted to recover the first stage by parachuting it into the ocean, but it burned up on reentry, before the parachutes even deployed.<sup class=\"reference\" id=\"cite_ref-parachute_18-0\"><a href=\"#cite_note-parachute-18\">[12]\n", "\n", "\n", "<th rowspan=\"2\" scope=\"row\" style=\"text-align:center;\">2\n", "\n", "8 December 2010,
15:43<sup class=\"reference\" id=\"cite_ref-spaceflightnow_Clark_Launch_Report_19-0\"><a href=\"#cite_note-spaceflightnow_Clark_Launch_Report-19\">[13]\n", "\n", "<a href=\"/wiki/Falcon_9_v1.0\" title=\"Falcon 9 v1.0\">F9 v1.0<sup class=\"reference\" id=\"cite_ref-MuskMay2012_13-1\"><a href=\"#cite_note-MuskMay2012-13\">[7]
B0004.1<sup class=\"reference\" id=\"cite_ref-block_numbers_14-1\"><a href=\"#cite_note-block_numbers-14\">[8]\n", "\n", "<a href=\"/wiki/Cape_Canaveral_Space_Force_Station\" title=\"Cape Canaveral Space Force Station\">CCAFS,
<a href=\"/wiki/Cape_Canaveral_Space_Launch_Complex_40\" title=\"Cape Canaveral Space Launch Complex 40\">SLC-40\n", "\n", "<a href=\"/wiki/SpaceX_Dragon\" title=\"SpaceX Dragon\">Dragon <a class=\"mw-redirect\" href=\"/wiki/COTS_Demo_Flight_1\" title=\"COTS Demo Flight 1\">demo flight C1
(Dragon C101)\n", "\n", "\n", "\n", "<a href=\"/wiki/Low_Earth_orbit\" title=\"Low Earth orbit\">LEO (<a href=\"/wiki/International_Space_Station\" title=\"International Space Station\">ISS)\n", "\n", "<style data-mw-deduplicate=\"TemplateStyles:r1126788409\">.mw-parser-output .plainlist ol,.mw-parser-output .plainlist ul{line-height:inherit;list-style:none;margin:0;padding:0}.mw-parser-output .plainlist ol li,.mw-parser-output .plainlist ul li{margin-bottom:0}<div class=\"plainlist\">\n", "

<a href=\"/wiki/NASA\" title=\"NASA\">NASA (<a href=\"/wiki/Commercial_Orbital_Transportation_Services\" title=\"Commercial Orbital Transportation Services\">COTS)
<a href=\"/wiki/National_Reconnaissance_Office\" title=\"National Reconnaissance Office\">NRO

\n", "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Falcon 9 first stage will land successfully\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Several examples of an unsuccessful landing are shown here:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

<a href=\"/wiki/NASA\" title=\"NASA\">NASA (<a href=\"/wiki/Commercial_Orbital_Transportation_Services\" title=\"Commercial Orbital Transportation Services\">COTS)
<a href=\"/wiki/National_Reconnaissance_Office\" title=\"National Reconnaissance Office\">NRO

\n", "

samice99 / Testrepos

Task 3 #1