Parsing `xml` file into python

msdogan / HydropowerProject

Optimizing operations of hydropower facilities

MIT License

8 stars 1 forks source link

Parsing `xml` file into python #2

Closed msdogan closed 7 years ago

msdogan commented 7 years ago

@jdherman if you had a experience with exporting a xml file into python, do you know how to access these three elements for Data Item 'LMP_PRC' from data frame?

INTERVAL_START_GMT
INTERVAL_END_GMT
VALUE

capture

my python code:

# This code optimizes operations of pumped-storage hydropower facilities. 
# Mustafa Dogan
#10/06/2016

from __future__ import division
import numpy as np 
import matplotlib.pyplot as plt
from scipy.integrate import quad, dblquad, trapz, simps
from scipy.stats import lognorm
from mpl_toolkits.mplot3d import Axes3D
import xml.etree.ElementTree as ET
# import seaborn as sns
# sns.set_style('whitegrid')

tree = ET.parse('20160901_20161002_PRC_LMP_DAM_20161005_11_18_31_v1.xml')
root = tree.getroot()

# this does not work!!!
print(root[0][0].text)
for child in root:
    print(child.tag)

jdherman commented 7 years ago

Wow, not the easiest format for a time series of energy prices!

What does it say in the error message? What is "root"?

If you upload or link me to the XML file I can play around with it.

msdogan commented 7 years ago

well there is no error message yet. This code

tree = ET.parse('20160901_20161002_PRC_LMP_DAM_20161005_11_18_31_v1.xml')

reads xml file but I couldn't figure out how to retrieve stuff that I want. I thought this is like a dictionary but it is not :) There is also csv version. If this is too time consuming, I can just use csv. I thought, if I read this as dictionary, it might be more useful than a csv. Files are zipped and attached. Thanks a lot! Pumped_Storage.zip

jdherman commented 7 years ago

Oh, use CSV!! Use pandas to load it, and it will behave like a dictionary. (XML is tough, you have to iterate over all of the child elements).

import pandas as pd
import seaborn as sns

name = 'Pumped_Storage/20160901_20161002_PRC_LMP_DAM_20161005_11_19_18_v1.csv'

# assume the "start time" is the index for the dataframe
df = pd.read_csv(name, index_col=0, parse_dates=True)
df.MW.plot(style='o') # access any column name here (I assume MW is the price?)
plt.ylabel('Whatever this is')
plt.show()

msdogan commented 7 years ago

Great! I will use pandas and csv instead, then. Thank you.

jdherman commented 7 years ago

Sure thing. If you still want to try the nested-dict idea, there's a library called xmltodict: https://github.com/martinblech/xmltodict

Just pip install xmltodict and then

import xmltodict
my_dict = xmltodict.parse(open('whatever_file.xml'))

I've never used this, only found by googling.

msdogan commented 7 years ago

I think pandas is good for now :) I really appreciate that.