Open stultus opened 9 years ago
I think this has to be prioritized before other enhancements.
Absolutely! For a start, the wikiwoods.dm file should just be loaded once. At the moment, it gets loaded every time findBestPear is called -- and even worse, every time a pear is looked at in scorePages (so 3 more times). On my machine, it takes around 2s to load, so that's already 8s gone... :(
@minimalparts the wikiwoods.dm is created manually (using some tool) right?, what is your opinion about converting it into an sqlite table and querying it?
Yes, absolutely!
Same issue with the doc.dists files. See for example http://aurelieherbelot.net/pears-demo/pearone/doc.dists.txt. But I have no idea... can we also convert those to sqlite and have them downloadable from a website?
Actually, I'm talking rubbish, wikiwoods.dm is only called once in scorePages, but that's also totally unnecessary, because it recalculates the distribution of the query, which has already been done in findBestPears.
I guess what we want is: load wikiwoods.dm when launching the application. Calculate the query's distribution (mkQueryDist) once, in findBestPears, and load the doc.dists files in scorePages.
PR #22 introduces an sqlite database for wikiwoods. lets see how this goes
Right now 'scoreDocs' and 'runScript' are taking around 13 seconds and 24 seconds.