mcartright / julien

Toolkit for Information Retrieval research
7 stars 1 forks source link

Use underlying iterators for performance #26

Closed daltonj closed 11 years ago

daltonj commented 11 years ago

The SimpleProcessor should use the underlying iterators.

The profiler shows that calls to julien.retrieval.Term.underlying are resulting in large numbers of calls to scala.runtime.AbstractFunction0.init.

We need to be careful to make sure the statistics being used are correct. The initial stab replacing the underlying iterators resulted in incorrect counts due to iterators not returning "0" when they are not at the correct doc.

mcartright commented 11 years ago

One idea (pending verification via profiling) is to remove the assert check and get for underlying. I'd like to have a "checked" and an "unchecked", faster version of the IteratedHook, and basically we can provide some mechanism for either 1) asking for an unchecked version, or 2) implicitly converting a checked to an unchecked version when an index attachment is made. I think (1) is easier, but (2) would be nicer because it would remove that extra step for any clients.

mcartright commented 11 years ago

This has been done. Checked iterators have been replaced with unchecked ones.