add more functionality to readhtml

santoshphilip / eppy

scripting language for E+, Energyplus

MIT License

151 stars 65 forks source link

add more functionality to readhtml #16

Open santoshphilip opened 10 years ago

santoshphilip commented 10 years ago

read html is a bare skeleton now. Do the following:

have more header data for the tables
have more functionality to manipulate the table. Maybe make the tables into a dictionary with the headers as tuple keys

santoshphilip commented 10 years ago

cleanup the non-ascii characters in the cells of the table

santoshphilip commented 10 years ago

put an html viewer that can be used in ipython notebook

from IPython.display import HTML
HTML(str(soup.table))

eayoungs commented 10 years ago

I've started a function, 'report_tables', which creates a dictionary of all the Report objects in the output file. I'm working on the unit test for the function now on my 'eayoungs' branch.

eayoungs commented 10 years ago

'report_tables' function appears to be working. Created a test with py.test using the same sample html input used to test other functions in readhtml.py Also created a 'select_table' function to specify an EnergyPlus report to pull from the selected output file. Expecting to pass the report's contents to the '_make_ntgrid' function to create an immutable array for the selected report in which each value can be addressed by the names of the row & column in which it occurs.

FYI - Changes are on the 'eayoungs' branch and have not been merged with master yet.

eayoungs commented 10 years ago

I think it would be useful, at some point, to allow the option for these functions to work with the EnergyPlus SQLite output option. Reports and resulting data sets should the same, but rather than parsing an entire E+ output file into Python data structures prior to manipulation, this would use queries to read individual records and parse only the data contained therein as needed.

This occurred to me as I was reading up on the custom Python data type, collections.namedtuple() (https://docs.python.org/2/library/collections.html#collections.namedtuple) used in the '_make_ntgrid' function, where reading a database record into the 'namedtuple' is an explicitly recommended application for the datatype. "Named tuples are especially useful for assigning field names to result tuples returned by the csv or sqlite3 modules"

eayoungs commented 10 years ago

I've committed new changes to the _report_tables function. I was trying to create a report dictionary containing dictionaries of all tables in each report. I've made some good progress and have at least created a dictionary of all tables but need to further refine the process of extracting the table name out of the header by the lines_table function, and refine the logic behind creating a report dictionary with each report name as a key and a dictionary of all the tables in the report as the value.

eayoungs commented 9 years ago

Santosh, I've been working on this lately for myself and as a request from a colleague. I've got the following IPython notebook. Let me know what you think. https://github.com/eayoungs/EPlusTemplates/blob/public/Scripting/readhtml.ipynb

Happy Memorial Day

Eric

santoshphilip commented 9 years ago

Good I'll take a look and respond.

Santosh.

Sent from my iPad

On May 25, 2015, at 10:35 AM, Eric Youngson notifications@github.com wrote:

Santosh, I've been working on this lately for myself and as a request from a colleague. I've got the following IPython notebook. Let me know what you think. https://github.com/eayoungs/EPlusTemplates/blob/public/Scripting/readhtml.ipynb

Happy Memorial Day

Eric

— Reply to this email directly or view it on GitHub.