Closed msokolov closed 1 year ago
+1, but it looks like I merged these two PRs in the wrong order? This one seems to subsume some of the changes from the last one (e.g. wikimedium500
)? Can you resolve the conflicts and merge this? Thanks @msokolov!
OK, this is now working with the current mainline and reproducing similar results as I have been seeing all along with this change. It will enable testing with 8-bit encoded vectors. Perhaps as a future step we can switch nightlies over to using this as it seems to be a win.
We have been reading the entire GloVe dictionary (400K entries, 332M) in order to convert our vector tasks from text to vectors, but we have already precomputed these in the tasks folder. This switches over to using the precomputed files. The main impact will be faster runs (no need to load the dictionary) and JFR profiles that reflect query execution costs rather than setup costs. Query timing is not changed since all this work was done during setup.