semanticize / semanticizest

Standalone Semanticizer
Apache License 2.0
32 stars 15 forks source link

Redirect handling is slow (or broken?) #8

Closed larsmans closed 10 years ago

larsmans commented 10 years ago

I ran the dump parser on the latest nlwiki dump overnight, without n-gram counting. It churned through the dump in 1h15min, producing a 348MB model, but then it got stuck at the redirect processing.

larsmans commented 10 years ago

To be sure, it had been working for 16h when I killed it.

larsmans commented 10 years ago

40 Megs off by normalizing link targets (d20065b78cd8415cf585536ba9edfbf5becbe839). Now trying with an extra index (3f68de9e1326a1a9cff2da1eff6a000a058aefa7).

larsmans commented 10 years ago

This is making it extremely slow :(

Nevermind, seems to be a VM issue. On my laptop, it becomes almost instant on the first 10000 articles.

larsmans commented 10 years ago

Fixed by 887a1497325016b7ba565b372eeda04d357fc456.