seykron / ogov-data

Argentina's Congress Datasets in JSON format
GNU General Public License v2.0
1 stars 1 forks source link

Increase updating frequency #2

Open martinszy opened 10 years ago

martinszy commented 10 years ago

Right now, ogov-importer runs in the whole site to know what is changed. This could be reduced to only checking projects introduced in the current calendar year or the past calendar year, since older projects lose "parliamentary status" after one year and would (theoretically) never again be updated. If this is implemented, perhaps we can increase the updating frequency of the data, since each run would not be so intensive.

seykron commented 10 years ago

Maybe we need a smarter heuristic to deal with progressive imports. As we import full pages, we don't know which bills are included in a specific range and the Congress search engine does not ensure any specific order to display results (this is a known-issue actually, because the importer assumes that a query always retrieves the same set of bills, which could be false, the only way to force a fresh import is cleaning up the cache).

Anyway, if you want to import only a year you can change the query at BillImporter.js. Just put whatever you want in the "fecha_inicio" parameter specified in DATA_SOURCE constant.

Regarding the import process, last performance tunning reduced the time in 80%. If you already have bills in the cache, it takes ~10 minutes to process 100K bills with a Intel CoreDuo. And as the memory leak was fixed, it requires a constant average of ~150MB of memory and a constant system load of ~2.0 for the whole process.

I will think in a strategy to perform progressive updates... let me know if you have some idea.