paracrawl / Domain_Adaptation

InDomain detection is a tool designed to extract in-domain data from a large collections of data.
GNU General Public License v3.0
1 stars 1 forks source link

Run out of RAM? #10

Closed kpu closed 5 years ago

kpu commented 5 years ago

Does LadderXMLfile parse lazily or is this going to load the entire XML into RAM?

https://github.com/paracrawl/Domain_Adaptation/blob/cf06a0e0bb7257d4747fe3d08bac96dbac8a4c5f/P3_DD_Extract.py#L51

dionwiggins commented 5 years ago

Updated to stream.

kpu commented 5 years ago

The issue can be closed when the master branch is fixed.

dalisola commented 5 years ago

issue has been solved on ExtractMatchedDomainData.py