Closed abhinavkulkarni closed 1 year ago
Thanks for the report! I've looked into this. Basically, the page on "2015 Diamond Head Classic" (and also 2016, 2017, 2018) isn't in the downloaded corpus, possibly because the crawler/parser decided it's too short and removed it. It's the DPR wiki-100 corpus in case you'd like to directly use it.
In my experience whenever such a direct query fails to find the document, 90% of the time it's just not in the index (or, a bit less likely, the passage splitting is unfavorable).
Closing. But feel free to reopen if needed.
We're considering whether to host a 2023 Wikipedia index instead and to fix up some of the issues in the DPR corpus in it. Will this be helpful to you?
Hey @okhat,
A newer version of Wikipedia would undoubtedly help, but I am currently only trying out a few ideas; I could work with the 2019 corpus.
As in the DSP notebook, I couldn't find how to set up a remote ColBERTv2 server. All I could find on ColBERTv2 README was the Python API. Can you please elaborate more on that?
I am trying to set up a small ColBERTv2 server on a remote GPU-enabled machine and would like to query it from my laptop for experimentation.
Thanks!
This is actually a common request! A member of the team will merge a version soon. Could you just paste the same request in a new issue and I’ll forward that to him
Thanks @okhat, I have added the issue here: https://github.com/stanford-futuredata/ColBERT/issues/173
Hi,
Thanks for this great project!
I was playing around with different prompts of my own within the DSP framework, and I am having trouble getting a correct answer to the following simple question:
Which team does the player named 2015 Diamond Head Classic’s MVP play for?
There is a Wikipedia page about the 2015 Diamond Head Classic (link). The phrase "2015 Diamond Head Classic" appears in the title as well as the abstract. The abstract also mentions "Buddy Hield" was named MVP.
However, the ColBERTv2 retriever is unable to retrieve the exact Wikipedia page in top 5 results. I checked the page's history and it was added in 2015, so it should have been present in the 2019 Wikipedia dump.
1st Hop
2nd Hop
The subsequent hops cannot find the answer as the appropriate passage is not retrieved in the 2nd hop.
Thanks!
CC @okhat