Open TomNicholas opened 2 months ago
That's an interesting idea... I think this would only be useful if the spreadsheet followed some specific schema though.
An experiment would be using pandas.read_excel
to return multiple sheets as a dict of pd.DataFrame
objects, followed by calling xarray.Dataset.from_dataframe
for each dataframe, and then using DataTree.from_dict
.
If that actually works out then maybe we could add it as an example to the IO page on xarray's documentation.
(Note also that this idea isn't really datatree-specific, because you could use pandas.read_excel(..., sheet_name='some_name')
to read one sheet and create one xr.Dataset
.)
pd.read_excel(..., sheet_name=None)
already returns multiple sheets as a dict.
Also, I think some excel spreadsheets are just nested CSVs in one file.
import pandas as pd
import xarray as xr
from datatree import DataTree
dfs = pd.read_excel("sheets.xlsx", sheet_name=None)
ds_dict = {}
for sheet_name, df in dfs.items():
ds_dict[sheet_name] = xr.Dataset.from_dataframe(df)
dt = DataTree.from_dict(ds_dict)
dt
That's cool!
Using pd.read_excel
directly is not lazy though. Creating a backend for it would make it lazy.
from @ahuang11 in https://github.com/xarray-contrib/datatree/issues/342