Closed KrishnenduGhosh closed 8 years ago
Hi @KrishnenduGhosh
Rish (the second author) worked on the BM baseline method for this project. I remember that he spent quite a bit of time tuning the BM baseline and preprocessing.
Could you email Rish (hrishjoshi2@gmail.com and hjoshi@mit.edu) for more information about the Lucene set-up? Sorry about the inconvenience.
@KrishnenduGhosh I could also help to email him as well. Just let me know your email address.
@taolei87 My email address is: kghosh.cs@gmail.com / kghosh.cs@iitkgp.ac.in
Hi Tao Lei,
Recently I was trying to develop a Lucene based BM25 baseline method using the Askubunbtu dataset you provided. While writing the indexwriter I used title+body from all the 167765 questions and while testing I searched for title+body for all the 189 queries (11 queries have no similar questions). The indexsearcher similarity I set as BM25similarity in Apache Lucene 6.1.0. I have used all Lucene settings as default apart from the analyzer (EnglishAnalyzer).
But the problem is: I am getting a MAP value of around 0.11 which is not at all comparable to the performance you mentioned for BM25. Hence, I feel that somewhere I am missing some steps. Can you please help me in that issue?