jekwatt / idiomatic_pandas

Tips and tricks for the most common data handling task with pandas.
0 stars 0 forks source link

Reading poorly structured Excel files with Panda #5

Open jekwatt opened 3 years ago

jekwatt commented 3 years ago

https://pbpython.com/pandas-excel-range.html

With pandas it is easy to read Excel files and convert the data into a DataFrame. Unfortunately Excel files in the real world are often poorly constructed. In those cases where the data is scattered across the worksheet, you may need to customize the way you read the data.

The simplest solution for this data set is to use the header and usecols arguments to read_excel().

from pathlib import Path

src_file = Path.cwd() / 'data.xlsx'
df = pd.read_excel(src_file, header=1, usecols='B:F')
jekwatt commented 3 years ago

The pandas usecols can also take a list of column names. This code will create an equivalent DataFrame:

df = pd.read_excel(
    src_file,
    header=1,
    usecols=['col_1', 'col_2, 'col_3', 'col_4', 'col_5'])