rkiddy / ca_hhs

The California Health and Human Services department publishes data in different formats. These apps will import and display the data. This is a work in progress.
0 stars 0 forks source link

Need to be able to read from all excel files #5

Open rkiddy opened 3 months ago

rkiddy commented 3 months ago

As of now, I am only reading xlsx files. I have created excel.py to start to abstract away the differences, but the only implementation that is complete is for xlsx files.

Also, it is obnoxious that I am passing in the file extension. This should be done in a way that senses the type. I probably need to create a class for this. Then I would give the filename to a class instance and use generic methods to get the data out.

This is especially important in chargemasters, because there are so many file types.

$ find 2* -type f | awk 'BEGIN{FS="."}{print $NF}' | tr '[A-Z]' '[a-z]' | sort | uniq -c
     21 csv
    943 doc
   1852 docx
      3 dot
    371 pdf
      4 rtf
      8 txt
   2965 xls
     19 xlsb
     45 xlsm
   6286 xlsx