mitdbg / aurum-datadiscovery

MIT License
74 stars 49 forks source link

Faster indexing #52

Open raulcf opened 8 years ago

raulcf commented 8 years ago
jmftrindade commented 8 years ago

Do any of these help? https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html

raulcf commented 8 years ago

Thanks! Bulk requests should help.

In general, however, we need a more aggressive strategy here, as we are 1 order of magnitude lagging behind profiling. Ultimately, it is about reducing the amount of data we are indexing, and exploiting that we are loading columns of databases---repetition is common---filtering out data we have already seen (per column) should help a lot too.

raulcf commented 8 years ago

I just implemented bulk request. It helped a lot, actually. Indexing is now only 3x slower than profiling (although I haven't optimized profiling yet). In any case this is great news, the gaps is closing.

jmftrindade commented 8 years ago

Great, glad to hear that guide helped!