Open antonhagg opened 8 years ago
So I have found a workaround for this by first writing the xml to a file and then reading it into memory. This means that it doesn't have to have them both in memory at the same time. Is this an acceptable solution?
def get(self, url):
'Make a GET request for url and return the response content as a generic lxml object'
url = self.escapeUrl(url)
if "?mode=list" in url: #Check if we are requested a full tree of the directory
if os.path.exists('temp.xml'):
os.remove('temp.xml')
with open("temp.xml", "w") as text_file:
text_file.write(self.raw(url))
o = lxml.objectify.parse("temp.xml")
o = o.getroot()
if os.path.exists('temp.xml'):
os.remove('temp.xml')
else:
o = lxml.objectify.fromstring(self.raw(url))
if o.tag == 'error':
JFSError.raiseError(o, url)
return o
Hey @antonhagg, I think you are right, we need to do something to limit our resource requirements. I'll take a look at your code, thanks!
Maybe we could try to create a StringIO object and , if we see that the file is really big, we write it to disk.
Then we parse with objectify.parse(fileobject)
.
Sounds like a good idea, won't have time to do anything until August. So if anyone else is up for the job, feel free. =)
@antonhagg I had a go at it, will you please test to see if current code in master works for you now?
Since "folder download" is not in the 0.5.1 release, I will have to add that first. Tried a new innstallation of the 0.5.1, but ran into a lot of trouble... will have to sort that out first.
@antonhagg The code has not been released yet. Are you able to install from git head? that is, with git clone
, and not with pip
?
This is mainly related to #78 where an xml- file can grow quite big (in my case its around 500 mb and contains 779917 files and 90361 folders). But I guess this could happen otherwise too.
Anyway, there is an option to use a custom parser with the option "huge_tree" (http://stackoverflow.com/questions/11850345/using-python-lxml-etree-for-huge-xml-files). Would this be an option or is there another way of parsing large xml-files, for example in chunks?