ikaruswill / legal-retrieval

MIT License
0 stars 1 forks source link

Vector Space Model representation for documents and query #8

Open bsmmoon opened 7 years ago

bsmmoon commented 7 years ago

"intentional tort" AND "remoteness of damage"

It should be..

bsmmoon commented 7 years ago

VSM + 3-gram?

bsmmoon commented 7 years ago

Phrasal queries are 2 or 3 words long, max; so you if you are able to deal with phrasal queries, you can support them using n-word indices or with positional indices.

bsmmoon commented 7 years ago

An example project that uses N-gram VSM (link)

bsmmoon commented 7 years ago

Assuming that indexing is done using 3-gram, how to handle query with less than 3 tokens? Wildcard? If we use multiple n-gram indexes (1,2,3-gram), how to handle mixed query like '2 words' AND '3 words'?

bsmmoon commented 7 years ago

Use 1,2,3 gram models. Given phrase with length n, use n-gram first. If not enough document retrieved, try lower grams as well.