Closed daltonj closed 11 years ago
Testing "new york" in aquaint - offending doc is APW19980810.1040 for now. Julien has one more match.
Apparently fixed. So, the UW operator in Galago finds maximum position based on the end of extents (which is begin+1 if no end is specified). Julien only does begins, hence we had an "off by 1" issue. Current hack of (begin+1) in the maximum position calculation fixes this, but it's a band-aid. We need proper extent/span support in Julien.
Fixed this to work w/ positions by making the "<=" into a "<" only.
Dirichlet: score = -14.850306955773318 count = 0 cf = 8.6503393E-7 len = 2152 num = 0.0012975509004517164 den = 3652.0
background stats collFreq = 218 collLength = 252013235 docFreq = 115270 numdocs = 528030
annotated node from galago: score DirichletScoringIterator 699590 false -14.941497193837915 :dirichlet:collectionLength=252013235:documentCount=528155:mu=1500:nodeFrequency=199:w=0.025 lengths StreamLengthsIterator 699590 true 2152 document extent UnorderedWindowIterator 699590 false ExtentArray:doc=699590:count=0:[] extents TermExtentIterator 699590 true ExtentArray:doc=699590:count=5:[(84,85),(922,923),(1261,1262),(2019,2020),(2036,2037)] international extents TermExtentIterator 699590 true ExtentArray:doc=699590:count=14:[(28,29),(75,76),(146,147),(210,211),(244,245),(299,300),(330,331),(354,355),(577,578),(611,612),(686,687),(710,711),(828,829),(958,959)] organized