philippedeb / IN4325-project-corelR-6

IN4325 Group Project - corelR 6
https://brightspace.tudelft.nl/d2l/home/596319
1 stars 0 forks source link

Implement learning to rank #5

Open RDoting opened 6 months ago

RDoting commented 6 months ago

Depends on

2

3

4

levichy commented 6 months ago

https://pyterrier.readthedocs.io/en/latest/ltr.html

philippedeb commented 6 months ago

http://terrier.org/docs/current/javadoc/org/terrier/matching/models/package-summary.html

philippedeb commented 6 months ago

https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html

RDoting commented 6 months ago

From the slides:

Practical tips for creating a LTR pipeline

RDoting commented 6 months ago

Pairwise ranking is empirically more robust and efficient

RDoting commented 6 months ago

Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the "best" results appear early in the result list displayed to the user.

Calculating a score for each document

RDoting commented 6 months ago

First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, boolean model, weighted AND,[6] or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes.[7] In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents.

RDoting commented 6 months ago

By default, PyTerrier is configured for indexing and retrieval in English. See our notebook (colab) for details on how to configure PyTerrier in other languages.

Maybe look at the score of BM25 on certain languages, and if the score changes if you implement this

RDoting commented 6 months ago

https://colab.research.google.com/github/terrier-org/pyterrier/blob/master/examples/notebooks/non_en_retrieval.ipynb

RDoting commented 6 months ago

https://github.com/wis-delft/in4325-information-retrieval

RDoting commented 6 months ago

https://colab.research.google.com/github/terrier-org/pyterrier/blob/master/examples/notebooks/ltr.ipynb#scrollTo=5gCHuDiJMNJZ

RDoting commented 6 months ago

https://cli.github.com/manual/gh_repo_sync

RDoting commented 6 months ago

https://pyterrier.readthedocs.io/en/latest/datasets.html

RDoting commented 6 months ago

https://github.com/castorini/duobert

RDoting commented 6 months ago

Compare:

RDoting commented 6 months ago

Original MIRACL paper uses k=1000