mito-ds / mito

The mitosheet package, trymito.io, and other public Mito code.
https://trymito.io
Other
2.25k stars 151 forks source link

Import excel files from within mitosheet.sheet() like you can with CSV files #82

Open twelsh37 opened 2 years ago

twelsh37 commented 2 years ago

Is your feature request related to a problem? Please describe. I work with a variety of Excel spreadsheets. Some of these have multiple sheets. I am currently unable to read them in from Python and must use the Mitosheet Import function to select/deselect the sheets that I want.

If I try and read them in from Mitosheet.sheet('spreadsheet.xlsx') I get the following error pointing to read_csv Error as follows

Invalid argument passed to sheet: DroneWarsData.xlsx. This path could not be read with a pd.read_csv call. Please pass in the parsed dataframe directly.

Describe the solution you'd like I would like to be able to load in all the sheets or specific sheets into mito from python using pandas read_excel or something similar. Ifound the following article helpful. https://pythoninoffice.com/read-multiple-excel-sheets-with-python-pandas/

Describe alternatives you've considered The only other alternative seems to be using Mitosheet's Import function. While this works it interrupts what could be an automatic flow.

Additional context importing all the sheets via Mitosheet Import function results in the following code being created

# Imported DroneWarsData.xlsx
import pandas as pd
sheet_df_dictonary = pd.read_excel('DroneWarsData.xlsx', sheet_name=['Afghanistan', 'Somalia', 'Yemen', 'Pakistan', 'Variables', 'All', 'Unknown_Locations', 'All_WithoutUnknown', 'All_WithUnknown', 'Copy of All_WithoutUnknown', 'US Confirmed', 'US Confirmed (updated)', 'Confirmed vs Unconfirmed', 'Copy of Confirmed vs Unconfirme', 'Afghanistan(2)', 'Somalia(2)', 'Yemen(2)', 'All(2)'], skiprows=0)
Afghanistan = sheet_df_dictonary['Afghanistan']
Somalia = sheet_df_dictonary['Somalia']
Yemen = sheet_df_dictonary['Yemen']
Pakistan = sheet_df_dictonary['Pakistan']
Variables = sheet_df_dictonary['Variables']
All = sheet_df_dictonary['All']
Unknown_Locations = sheet_df_dictonary['Unknown_Locations']
All_WithoutUnknown = sheet_df_dictonary['All_WithoutUnknown']
All_WithUnknown = sheet_df_dictonary['All_WithUnknown']
Copy_of_All_WithoutUnknown = sheet_df_dictonary['Copy of All_WithoutUnknown']
US_Confirmed = sheet_df_dictonary['US Confirmed']
US_Confirmed_updated = sheet_df_dictonary['US Confirmed (updated)']
Confirmed_vs_Unconfirmed = sheet_df_dictonary['Confirmed vs Unconfirmed']
Copy_of_Confirmed_vs_Unconfirme = sheet_df_dictonary['Copy of Confirmed vs Unconfirme']
Afghanistan_2 = sheet_df_dictonary['Afghanistan(2)']
Somalia_2 = sheet_df_dictonary['Somalia(2)']
Yemen_2 = sheet_df_dictonary['Yemen(2)']
All_2 = sheet_df_dictonary['All(2)']

DroneWarsData.xlsx

naterush commented 2 years ago

Lovely! Some thoughts on implementation:

  1. We just need a new preprocessing step performer for xlsx files (that shares infra with the XLSX import).
  2. Handling different sheets for multiple strings might be a little tricky. Namely, the sheet_name argument could be either: a list (if there is one string passed), or a dict (if there are multiple strings passed).

This seems like a pretty easy one!