Closed larsmans closed 10 years ago
To be sure, it had been working for 16h when I killed it.
40 Megs off by normalizing link targets (d20065b78cd8415cf585536ba9edfbf5becbe839). Now trying with an extra index (3f68de9e1326a1a9cff2da1eff6a000a058aefa7).
This is making it extremely slow :(
Nevermind, seems to be a VM issue. On my laptop, it becomes almost instant on the first 10000 articles.
Fixed by 887a1497325016b7ba565b372eeda04d357fc456.
I ran the dump parser on the latest nlwiki dump overnight, without n-gram counting. It churned through the dump in 1h15min, producing a 348MB model, but then it got stuck at the redirect processing.