weso / CWR-DataApi

CWR-DataApi
MIT License
34 stars 29 forks source link

The parser is unable to handle big files #168

Open Bernardo-MG opened 9 years ago

Bernardo-MG commented 9 years ago

Currently the parser is able to handle only files of a handful MBs.

It has been verified to work with files up to 50MB, and takes around 3 minutes for each 10MBs.

The reason behind this problem seems to be the Pyparsing library. But swapping it for another one would mean rewriting the factory, and the configuration DSL.

Alexkane commented 7 years ago

Would it be conceptually possible to break this process down and run it in parallel to improve performance?

Bernardo-MG commented 7 years ago

In my little experience with parallelizing, it isn't so easy to take advantage of.

Also, first of all the parser itself would need to split the file, and then decide how to handle these pieces before continuing with the parsing.