samukweku / data-wrangling-blog

data wrangling blog
https://samukweku.github.io/data-wrangling-blog/
10 stars 0 forks source link

Access Excel Tables with Python | Samuel Oranyeli #12

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Access Excel Tables with Python | Samuel Oranyeli

Extract data from a defined table in a spreadsheet

https://samukweku.github.io/data-wrangling-blog/spreadsheet/python/pandas/openpyxl/2020/05/19/Access-Tables-In-Excel.html

brianblakeley commented 2 years ago

hello! thank you for posting this code....was the only article i could find that shows how to bring in a table from an excel sheet with multiple tables!

i tried using your code and got an error:

AttributeError Traceback (most recent call last)

in 6 ws = wb['Sheet1'] 7 ----> 8 {key : value for key, value in ws.tables.items()} 9 10 for table in ws._tables: AttributeError: 'Worksheet' object has no attribute 'tables' what am i doing wrong?(file exists and there is 2 tables on it--) from openpyxl import Workbook from openpyxl import load_workbook from openpyxl.worksheet.table import Table, TableStyleInfo wb = load_workbook(filename=r'C:\Users\Keith\Downloads\20220123_pythondlparameters.xlsx') ws = wb['Sheet1'] {key : value for key, value in ws.tables.items()}
samukweku commented 2 years ago

Hi, @brianblakeley. Apologies for replying this late. I just saw your message. Have you resolved the issue? I have wrapped this into a function, that should make it easier to use. Let me know if you still have this issue. Again my sincere apologies for the late reply.

IrishJimmy67 commented 2 years ago

This is brilliant! Thank you very much!

DanielPerkins472 commented 1 year ago

Hello @samukweku. Im trying to use your xlsx.table function from janitor to accomplish the above, however Im unable to import the xlsx.table function. I have successfully downloaded / installed pyjanitor and import the package into the jupyter notebook, however I can failures when I try to import xlsx.table from janitor. Do you have any advice?

samukweku commented 1 year ago

@DanielPerkins472 thanks for getting in touch.

can you share your script?

import janitor as in
jn.xlsx_table
DanielPerkins472 commented 1 year ago

Below is the script that I had used

!pip install pyjanitor from janitor import xlsx_table

returns error

ImportError Traceback (most recent call last)

in ----> 1 from janitor import xlsx_table ImportError: cannot import name 'xlsx_table' from 'janitor' (C:\Users\perkind2\Anaconda3\lib\site-packages\janitor\__init__.py) I did more searching and found another thread where it looks like someone else asked a similar question (link below). Your response there was that the version of pyjanitor was outdated and to uninstall and then reinstall the latest version from github. I looked into the io.py file downloaded and it does in fact look like there is no xlsx_table function. I tried using the pip common to download / install pyjanitor from your below post but was met with the below error Script !pip install git+https://github.com/pyjanitor-devs/pyjanitor.git Returns error Collecting git+https://github.com/pyjanitor-devs/pyjanitor.git Cloning https://github.com/pyjanitor-devs/pyjanitor.git to c:\users\perkind2\appdata\local\temp\pip-req-build-ffliyi_i Running command git clone -q https://github.com/pyjanitor-devs/pyjanitor.git 'C:\Users\perkind2\AppData\Local\Temp\pip-req-build-ffliyi_i' ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/pyjanitor-devs/pyjanitor.git 'C:\Users\perkind2\AppData\Local\Temp\pip-req-build-ffliyi_i' ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH? https://samukweku.github.io/data-wrangling/python_pandas_access_excel_tables_with_python.html
samukweku commented 1 year ago

@DanielPerkins472 Ahhhh ... You are on windows it seems. I think you should install git on your machine then retry the installation. Hopefully soon we will do a release on pyjanitor and you do not have to go through this. Lemme know how it goes

YehudaColton commented 1 year ago

With this method, my excel formulas are being imported instead of the resultant values. Is there a way to fix this?

samukweku commented 1 year ago

Hi @YehudaColton can you share a sample of the file, and the code used?

DanielPerkins472 commented 1 year ago

Hi @samukweku, I was able to get your dev version of janitor installed. I am now getting a new error when I try to import the xlsx_table function though. Any thoughts?


from janitor import xlsx_table

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e5fafa9ca390> in <module>
----> 1 from janitor import xlsx_table

~\Github\pyjanitor\janitor\__init__.py in <module>
      5     pass
      6 
----> 7 from .functions import *  # noqa: F403, F401
      8 from .io import *  # noqa: F403, F401
      9 from .math import *  # noqa: F403, F401

~\Github\pyjanitor\janitor\functions\__init__.py in <module>
     17 
     18 
---> 19 from .add_columns import add_columns
     20 from .also import also
     21 from .bin_numeric import bin_numeric

~\Github\pyjanitor\janitor\functions\add_columns.py in <module>
      1 import pandas_flavor as pf
      2 
----> 3 from janitor.utils import check, deprecated_alias
      4 import pandas as pd
      5 from typing import Union, List, Any, Tuple

~\Github\pyjanitor\janitor\utils.py in <module>
    209         "The keyword argument '{argument}' of '{func_name}' is deprecated."
    210     ),
--> 211     error: bool = True,
    212 ) -> Callable:
    213     """

TypeError: 'type' object is not subscriptable
samukweku commented 1 year ago

Hi @DanielPerkins472 let me have a look. This is on a windows machine right? If you have access to a Unix machine, could you try and see if it gives the same error?

DanielPerkins472 commented 1 year ago

Hi @samukweku that's correct this is on a windows machine. I don`t have any Unix experience but will see what I can find out.

samukweku commented 1 year ago

@DanielPerkins472 I just tested it on my Unix machine and works fine. unfortunately I do not have access to a windows machine to test it out.

YehudaColton commented 1 year ago

Hi @YehudaColton can you share a sample of the file, and the code used?

wb = load_workbook(file_name, data_only=True)

I modified the code slightly by utilizing this 'data_only' tag and it fixed the issue. Thank you for the response however, as well as the wonderful code.

samukweku commented 1 year ago

@DanielPerkins472 if you can, create a virtual environment, install only pyjanitor and see if you still get this error message. Let's isolate it and see if it is an OS issue.

I just tested on a windows VM, in a virtual environment and it works fine.

DanielPerkins472 commented 1 year ago

@Samukweku I ended up just using the openpyxl method above in this thread and was successful. I had an outdated version of anaconda / python that was causing issues so I had to completely uninstall / reinstall and then the openpyxl method worked for me.

hepbc commented 1 year ago

@samukweku This is a great resource! One query: these solutions require the tables to be "named". Is there some way to pick up contiguous cells as a table from a sheet where they have not been named as a specific table. Hoping I have explained the issue well!

samukweku commented 1 year ago

Hi @hepbc , excel tables require names. If they do not have names, then you can just use pd.read_excel, as they are just cells. If it is a bit more complex, have a look at janitor's xlsx_cells. Do you have sample data of your problem, with expected output? I can have a look