terrier-org / pyterrier

A Python framework for performing information retrieval experiments, building on http://terrier.org/
https://pyterrier.readthedocs.io/
Mozilla Public License 2.0
420 stars 65 forks source link

Non English Index loading issue #491

Open bhargav25dave1996 opened 2 months ago

bhargav25dave1996 commented 2 months ago

I am working with the Gujarati corpus from the FIRE collection and encountering a problem with the search results. The issue occurs when I try to load a previously generated index.

Please check the code and result below, which I am using to generate and load the index, which works well.

genrate_load genrate_load_result

Please check the code below, which I am using to load the index that was generated previously, where I am not getting results.

only load load_result

cmacdonald commented 2 months ago

Hi @bhargav25dave1996

Thanks for the report. Can you do two things for me: (1) pt.logging("DEBUG") (2) upload here the Terrier output for one query, original and after reloading the index.

I suspect somewhere the stemmer or tokeniser is being ignored, but we need to find out which.

bhargav25dave1996 commented 2 months ago

Hi @cmacdonald , Please find output and debug log below Original Index Debug orginal_debug.txt Original Index One Query Output Original.csv Reloading Index Debug reloading_index_debug.txt Reloading Index One Query Output reloading_the_index.csv