Closed zorgulle closed 10 years ago
It's okay for me if ouput files are like : rels-eng.csv words-eng.csv etc ...
Our process time is relatively low at this moment and we are not supposed to produce them a lot
Maybe it would be great if they are produced in a specific folder (csv_tmp for example) so it doesn't flood actual directories with files.
edit: furthermore if we are changing our sources ( partial or full OMW ) these timing are meaningless.
@zorgulle why is it slower? You might want to use one or two of the profiling tools listed here to trace it.
(5 seconds are definitely not worth it, but I had planned to give you this pointer eventually.)
By using Mr Bond's script we produce tab file, and csv files in 20 seconds. we realized that our parser is useless, because we don't read tab files anymore. do we have to keep the parser?
@zorgulle it makes sense to delete the tab parser now. We certainly don't want dead code floating around the project.
All cleaning has been done, we can close this.
By regrouping relation extraction and file parsing we can add other language easily but it is slower of 5 second