sotheanithsok / Habeas

A complete implementation of large scale search engine including on-disk indexing, multiple queries options, and user interfaces.
MIT License
0 stars 0 forks source link

K-Gram Index On Disk #65

Closed jblacklock closed 5 years ago

jblacklock commented 5 years ago

"Save your wildcard index to disk, and incorporate wildcards into ranked retrieval queries. Extend the 'create an index' procedure of your program to also generate a disk-based wildcard index, and likewise extend the 'query an index' procedure to load this wildcard index for use with wildcard queries. I will leave the design of this index up to you, but will gladly give input if you need it. You may create a design in which the entire wildcard index is read into and retained in memory for the duration of the search engine; you do not need to architect a system that reads wildcard information from the binary file each time a wildcard query is needed. To incorporate wildcards into ranked retrievals, simply include every vocabulary type that matches the wildcard token in the ranking procedure. (Yes, this gives higher scores to documents that contain multiple different words matching the wildcard query. If you can come up with a better procedure, I welcome your proposal.)"

sotheanithsok commented 5 years ago

69