vamseeachanta / energydata

MIT License
0 stars 0 forks source link

spike | Extract tabular data from website #6

Closed vamseeachanta closed 5 months ago

vamseeachanta commented 8 months ago

From a website with tabular data, get data as a Pandas Dataframe using code.

JayachandraJangiti commented 8 months ago

@vamseeachanta

Good morning Sir,

I tried to implement our problem extracting table from website with beatifulsoup and urllib, later I found that it is difficult to do dynamically (like getting output based on input ) . So, I came to know that it is better to work with selenium. We need to use both selenium and BeautifulSoup combinely. I am trying with selenium, but getting some errors. So, I am learning from some selenium sources to get clear idea, and trying to solve these errors.

JayachandraJangiti commented 7 months ago

@vamseeachanta

Good Afternoon Sir,

I have written python code to extract the content from website https://www.data.bsee.gov/Well/APD/Default.aspx based on the input given in the input box.

I have used selenium and BeautifulSoup together in order to extract table content after entering the value in to input box.

Code :

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

def extract_table_content(url, input_data):

    driver = webdriver.Chrome()

    driver.get(url)

    input_box = driver.find_element(By.XPATH, '//input[@name="ASPxFormLayout1$ASPxTextBoxAPI"]')
    submit_button = driver.find_element(By.XPATH, "(//div[@id='ASPxFormLayout1_ASPxButtonSubmitQ_CD'])[1]")

    input_box.send_keys(input_data)

    # Clicking on the submit button
    submit_button.click()

    # In order to Wait for the page to load
    driver.implicitly_wait(50)

    # Getting the page source after submission
    page_source = driver.page_source

    # Parsing the HTML content
    soup = BeautifulSoup(page_source, 'html.parser')

    # Finding the table element
    table = soup.find(id='ASPxFormLayout2_ASPxGridView1_DXMainTable')  # Modify with the actual table identifier

    table_content = []
    if table:
        rows = table.find_all('tr')
        for row in rows:
            cells = row.find_all(['th', 'td'])
            row_data = [cell.get_text(strip=True) for cell in cells]
            table_content.append(row_data)

    print(table_content)

    driver.quit()
extract_table_content("https://www.data.bsee.gov/Well/APD/Default.aspx",608174149400)

Necessary things to be installed for running the code :

selenium bs4

Running code :

As soon as you run the code, a chrome window will be opened, it will enter the value (you have given to the code) in the input box and clicks on submit Query button (in website) after that window will be closed.

We will get the output as a list.

I have given the value 608174149400 to the website https://www.data.bsee.gov/Well/APD/Default.aspx,

The corresponding output is :

[['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['#', '#', '', 'API Well Number', 'API Well Number', '', 'Permit Type', 'Permit Type', '', 'APD Received Date', 'APD Received Date', '', 'APD Approved Date', 'APD Approved Date', '', 'Company Number', 'Company Number', '', 'Company Name', 'Company Name', '', 'Bottom Area Code', 'Bottom Area Code', '', 'Bottom Block Number', 'Bottom Block Number', '', 'Bottom Lease Number', 'Bottom Lease Number', '', 'Surface Area Code', 'Surface Area Code', '', 'Surface Block Number', 'Surface Block Number', '', 'Surface Lease Number', 'Surface Lease Number', '', 'Surface NS Dist', 'Surface NS Dist', '', 'Surface NS Code', 'Surface NS Code', '', 'Surface EW Dist', 'Surface EW Dist', '', 'Surface EW Code', 'Surface EW Code', '', 'Well Type', 'Well Type', '', 'Well Name', 'Well Name', '', 'Well Name Suffix', 'Well Name Suffix', '', 'Water Depth (feet)', 'Water Depth (feet)', '', 'Rig ID Number', 'Rig ID Number', '', 'Rig Name', 'Rig Name', '', 'Surf X Coord Loc', 'Surf X Coord Loc', '', 'Surf Y Coord Loc', 'Surf Y Coord Loc', '', 'Surface Latitude', 'Surface Latitude', '', 'Surface Longitude', 'Surface Longitude', '', ''], ['#', ''], ['API Well Number', ''], ['Permit Type', ''], ['APD Received Date', ''], ['APD Approved Date', ''], ['Company Number', ''], ['Company Name', ''], ['Bottom Area Code', ''], ['Bottom Block Number', ''], ['Bottom Lease Number', ''], ['Surface Area Code', ''], ['Surface Block Number', ''], ['Surface Lease Number', ''], ['Surface NS Dist', ''], ['Surface NS Code', ''], ['Surface EW Dist', ''], ['Surface EW Code', ''], ['Well Type', ''], ['Well Name', ''], ['Well Name Suffix', ''], ['Water Depth (feet)', ''], ['Rig ID Number', ''], ['Rig Name', ''], ['Surf X Coord Loc', ''], ['Surf Y Coord Loc', ''], ['Surface Latitude', ''], ['Surface Longitude', ''], ['', '', '', '', '', '', '', '', '', 'Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', 'Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', '', 'Loading…', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'TodayClear', '', 'Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', 'Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', '', 'Loading…', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'TodayClear', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', ''], [''], ['', '', ''], [''], ['Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', '', 'Loading…', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'TodayClear', ''], ['', 'Loading…'], ['', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', ''], ['March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['March 2024', '', '', '', '', 'March 2024', '', '', '', ''], ['', '', '', '', 'March 2024', '', '', '', ''], ['SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6'], ['', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'], ['09', '25', '26', '27', '28', '29', '1', '2'], ['10', '3', '4', '5', '6', '7', '8', '9'], ['11', '10', '11', '12', '13', '14', '15', '16'], ['12', '17', '18', '19', '20', '21', '22', '23'], ['13', '24', '25', '26', '27', '28', '29', '30'], ['14', '31', '1', '2', '3', '4', '5', '6'], ['JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['Jan', 'Feb', 'Mar', 'Apr'], ['May', 'Jun', 'Jul', 'Aug'], ['Sep', 'Oct', 'Nov', 'Dec'], ['TodayClear'], ['Loading…March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDecTodayClear', '', 'Loading…', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'TodayClear', ''], ['', 'Loading…'], ['', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', ''], ['March 2024SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456JanFebMarAprMayJunJulAugSepOctNovDec', 'March 2024', '', '', '', '', 'March 2024', '', '', '', '', 'SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['March 2024', '', '', '', '', 'March 2024', '', '', '', ''], ['', '', '', '', 'March 2024', '', '', '', ''], ['SunMonTueWedThuFriSat092526272829121034567891110111213141516121718192021222313242526272829301431123456', '', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', '09', '25', '26', '27', '28', '29', '1', '2', '10', '3', '4', '5', '6', '7', '8', '9', '11', '10', '11', '12', '13', '14', '15', '16', '12', '17', '18', '19', '20', '21', '22', '23', '13', '24', '25', '26', '27', '28', '29', '30', '14', '31', '1', '2', '3', '4', '5', '6'], ['', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'], ['09', '25', '26', '27', '28', '29', '1', '2'], ['10', '3', '4', '5', '6', '7', '8', '9'], ['11', '10', '11', '12', '13', '14', '15', '16'], ['12', '17', '18', '19', '20', '21', '22', '23'], ['13', '24', '25', '26', '27', '28', '29', '30'], ['14', '31', '1', '2', '3', '4', '5', '6'], ['JanFebMarAprMayJunJulAugSepOctNovDec', 'JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['JanFebMarAprMayJunJulAugSepOctNovDec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ['Jan', 'Feb', 'Mar', 'Apr'], ['May', 'Jun', 'Jul', 'Aug'], ['Sep', 'Oct', 'Nov', 'Dec'], ['TodayClear'], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '', ''], [''], ['', '608164025600', 'New Well', '2/22/1996', '3/20/1996', '00689', 'Shell Offshore Inc.', 'VK', '956', 'G06896', 'VK', '956', 'G06896', '1414', 'N', '7209', 'E', 'Exploratory', 'A020', 'ST00BP00', '3214', '', '', '1291671', '10548026', '29.06086446', '-88.09193857', ''], ['', '608164025400', 'New Well', '2/22/1996', '3/20/1996', '00689', 'Shell Offshore Inc.', 'VK', '956', 'G06896', 'VK', '956', 'G06896', '1402', 'N', '7198', 'E', 'Exploratory', 'A015', 'ST00BP00', '3214', '', '', '1291682', '10548038', '29.06089775', '-88.09190448', ''], ['', '608164025800', 'New Well', '2/22/1996', '3/20/1996', '00689', 'Shell Offshore Inc.', 'VK', '956', 'G06893', 'VK', '956', 'G06896', '1429', 'N', '7161', 'E', 'Exploratory', 'A011', 'ST00BP00', '3214', '', '', '1291718.926849', '10548010.663132', '29.06082349', '-88.09178809', ''], ['', '608164024500', 'New Well', '2/22/1996', '3/20/1996', '00689', 'Shell Offshore Inc.', 'VK', '956', 'G06896', 'VK', '956', 'G06896', '1426', 'N', '7129', 'E', 'Exploratory', 'A007', 'ST00BP00', '3214', '', '', '1291751', '10548014', '29.06083349', '-88.09168777', ''], ['', '608174059000', 'New Well', '1/2/1998', '1/21/1998', '00689', 'Shell Offshore Inc.', 'MC', '809', 'G05868', 'MC', '809', 'G05868', '5908', 'S', '3470', 'E', 'Development', 'A002', 'ST00BP00', '3800', '44736', 'H&P 204', '962770', '10222708', '28.15407782', '-89.10340562', ''], ['', '608164032700', 'New Well', '1/22/1998', '3/16/1998', '02237', 'Noble Affliates, Inc.', 'VK', '826', 'G06888', 'VK', '826', 'G06888', '3927', 'S', '5346', 'E', 'Development', 'A011', 'ST00BP00', '1932', '', '', '1325214', '10585047', '29.1635138', '-87.98789984', ''], ['', '177094120301', '', '4/3/1998', '', '01777', 'Devon Louisiana Corporation', 'EI', '119', '00049', 'EI', '119', '00049', '6533', 'N', '1045', 'E', '', '035', 'ST00BP00', '40', '', '', '1953888.52', '118373.78', '28.99209409', '-91.47755036', ''], ['', '177244067506', '', '4/3/1998', '', '02313', 'Freeport-McMoRan Energy LLC', 'MP', '299', 'G09372', 'MP', '299', 'G09372', '6772', 'N', '3907', 'E', '', 'SW227', 'ST06BP00', '210', '', '', '2821083', '227058', '29.26567776', '-88.758128', ''], ['', '177054107800', '', '4/3/1998', '4/9/1998', '02148', 'Westport Oil and Gas Company, L.P.', 'VR', '220', 'G17910', 'VR', '220', 'G17910', '8756', 'S', '92', 'W', '', '001', '', '112', '', '', '1603200.0280', '16419.8090', '28.70592503', '-92.57086299', ''], ['', '177024118904', '', '4/4/1998', '', '01586', 'Petsec Energy Inc.', 'WC', '480', 'G13845', 'WC', '480', 'G13845', '8358', 'N', '263', 'W', '', '002', 'ST00BP00', '140', '', '', '1406472.76', '-42551.1520', '28.53650426', '-93.18135624', ''], ['', '177024119901', '', '4/4/1998', '', '01777', 'Devon Louisiana Corporation', 'WC', '528', 'G16202', 'WC', '528', 'G16202', '2001', 'N', '6182', 'E', '', 'A003', 'ST00BP00', '167', '', '', '1400027.76', '-95226.3440', '28.39140209', '-93.1987503', ''], ['', '177244081900', '', '4/6/1998', '4/16/1998', '00748', 'SOCO Offshore, Inc.', 'VK', '1002', 'G13035', 'VK', '1002', 'G13035', '1350', 'N', '7800', 'E', '', 'A005', '', '281', '', '', '3025800', '261980', '29.34744844', '-88.11329037', ''], ['', '177104149201', '', '4/6/1998', '', '00078', 'Chevron U.S.A. Inc.', 'EI', '339', 'G02318', 'EI', '339', 'G02318', '5791', 'S', '7477', 'W', '', 'C019', 'ST00BP00', '268', '', '', '1902321.88', '-173772.40', '28.18851913', '-91.63641156', ''], ['', '177250102001', '', '4/6/1998', '4/17/1998', '00078', 'Chevron U.S.A. Inc.', 'BS', '55', 'G01372', 'MP', '42', '00375', '432', 'N', '4086', 'E', '', 'E006', '', '40', '', '', '2734514', '277648', '29.40984014', '-89.02636732', ''], ['', '608044005600', '', '4/6/1998', '8/16/1983', '00003', 'Union Oil Company of California', 'EB', '158', 'G02645', 'EB', '159', 'G02646', '4337', 'N', '6203', 'W', '', 'A012', '', '924', '', '', '1115003', '10101583', '27.82736172', '-94.62604732', ''], ['', '177154094103', '', '4/7/1998', '4/2/1998', '00078', 'Chevron U.S.A. Inc.', 'ST', '37', 'G02625', 'ST', '37', 'G02625', '2210', 'N', '4367', 'W', '', 'I003', '', '53', '', '', '2309463.100748', '102603.922783', '28.94521605', '-90.36590372', ''], ['', '177064080902', '', '4/7/1998', '', '01956', 'Seagull Energy E&P Inc.', 'VR', '299', 'G13890', 'VR', '299', 'G13890', '5272', 'S', '1116', 'E', '', '002', 'ST00BP00', '194', '', '', '1631508.1240', '-105128.5750', '28.37253955', '-92.47879327', ''], ['', '177194066000', '', '4/7/1998', '', '01138', 'El Paso Production GOM Inc.', 'WD', '39', 'G16469', 'WD', '39', 'G16469', '4579', 'N', '3030', 'E', '', 'A001', 'ST00BP00', '82', '', '', '2483592', '160939.40', '29.10044015', '-89.81919208', ''], ['', '608044005601', '', '4/7/1998', '', '00003', 'Union Oil Company of California', 'EB', '158', 'G02645', 'EB', '159', 'G02646', '4337', 'N', '6203', 'W', '', 'A012', 'ST00BP00', '924', '', '', '1115003', '10101583', '27.82736177', '-94.62604731', ''], ['', '177004099500', '', '4/7/1998', '4/16/1998', '01855', 'Vastar Resources, Inc.', 'WC', '65', 'G02825', 'WC', '65', 'G02826', '3465', 'N', '4493', 'W', '', 'B019', '', '36', '', '', '1425460.8080', '360809.1440', '29.64639822', '-93.14206684', '']]

We got the output but as a list.

I will change the code as to print pandas Data frame directly and share with you soon.

Source I referred :

https://selenium-python.readthedocs.io/

vamseeachanta commented 7 months ago

@JayachandraJangiti , You seem to be going in a good direction.

Also, please refer to the following order and see if it is helpful: https://realpython.com/python-web-scraping-practical-introduction/ https://stackoverflow.com/questions/8377055/submit-data-via-web-form-and-extract-the-results https://nanonets.com/blog/web-scraping-with-python-tutorial/

PS: My google search "handle input forms in python to get data from website"

JayachandraJangiti commented 7 months ago

@vamseeachanta

Good morning Sir,

I have completed the task of extracting the table content from website based on input.

As there was some networking issues, there is a bit delay in sending you the task sir.

Here is the code https://github.com/JayachandraJangiti/JAY_SCRAPPING/tree/Extracting_table

Output for the value 608174149400 :

Screenshot (6)

And the Plantuml diagram of corresponding task is ,

https://github.com/JayachandraJangiti/JAY_SCRAPPING/blob/Extracting_table/Extract_table_content.plantuml

vamseeachanta commented 7 months ago

@JayachandraJangiti. What else did you do to get this going?

PS C:\Users\vamseea\github\ace\JAY_SCRAPPING> & c:/Users/vamseea/AppData/Local/miniconda3/envs/digitalmodel/python.exe c:/Users/vamseea/github/ace/JAY_SCRAPPING/extract_table.py Traceback (most recent call last): File "c:\Users\vamseea\github\ace\JAY_SCRAPPING\extract_table.py", line 1, in from selenium import webdriver PS C:\Users\vamseea\github\ace\JAY_SCRAPPING> & c:/Users/vamseea/AppData/Local/miniconda3/envs/digitalmodel/python.exe c:/Users/vamseea/github/ace/JAY_SCRAPPING/extract_table.py Traceback (most recent call last): File "c:\Users\vamseea\github\ace\JAY_SCRAPPING\extract_table.py", line 1, in from selenium import webdriver PS C:\Users\vamseea\github\ace\JAY_SCRAPPING> & c:/Users/vamseea/AppData/Local/miniconda3/envs/assetutilities/python.exe c:/Users/vamseea/github/ace/JAY_SCRAPPING/extract_table.py Traceback (most recent call last): File "c:\Users\vamseea\github\ace\JAY_SCRAPPING\extract_table.py", line 3, in from bs4 import BeautifulSoup PS C:\Users\vamseea\github\ace\JAY_SCRAPPING> & c:/Users/vamseea/AppData/Local/miniconda3/envs/assetutilities/python.exe c:/Users/vamseea/github/ace/JAY_SCRAPPING/extract_table.py C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\site-packages\pandas\core\arrays\masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed). from pandas.core import ( Traceback (most recent call last): File "C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\site-packages\selenium\webdriver\common\service.py", line 72, in start self.process = subprocess.Popen(cmd, env=self.env, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\subprocess.py", line 1026, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\subprocess.py", line 1538, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "c:\Users\vamseea\github\ace\JAY_SCRAPPING\extract_table.py", line 31, in extract_table_content("https://www.data.bsee.gov/Well/APD/Default.aspx",608164025600) #GIVE THE INPUTS HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\vamseea\github\ace\JAY_SCRAPPING\extract_table.py", line 8, in extract_table_content driver = webdriver.Chrome() ^^^^^^^^^^^^^^^^^^ File "C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init self.service.start() File "C:\Users\vamseea\AppData\Local\miniconda3\envs\assetutilities\Lib\site-packages\selenium\webdriver\common\service.py", line 81, in start raise WebDriverException( selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

vamseeachanta commented 5 months ago

got this resolved. Always deliver code with virtual enviroment file to recreate