Closed Fatima-Haouari closed 2 years ago
Hi @Fatima-Haouari, thanks for your message. Good to see people from Qatar around. I am currently your neighborhood in QCRI. Could you please share more details about your problem. The best would be if you could provide a short code that shows the error. Thanks
Thanks for your response. Nice to hear you are our neighbor. Please find below the code I am using
from trectools import TrecRun,TrecQrel, TrecEval, fusion
r1 = TrecRun("my_run1")
r2 = TrecRun("my_run2")
qrels= TrecQrel("my_qrels")
fused_run = fusion.combos([r1,r2],strategy="mnz")
fused_run=TrecRun(fused_run)
r1_p10 = TrecEval(r1, qrels).get_precision(depth=10)
r2_p10 = TrecEval(r2, qrels).get_precision(depth=10)
fused_run_p10 = TrecEval(fused_run, qrels).get_precision(depth=10)
print("P@10-- Run 1: %.3f, Run 2: %.3f, Fusion Run: %.3f" % (r1_p10, r2_p10, fused_run_p10))
fused_run.print_subset("my_fused_run.txt", topics=fused_run.topics())
Please find below the error I am getting
fused_run=TrecRun(fused_run)
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/trectools/trec_run.py", line 21, in __init__
if filename:
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/pandas/core/generic.py", line 1330, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Hi Fatima, thanks for reporting this issue. I have just modified the interface of this module to follow the same pattern. You can use it with the following code:
from trectools import TrecRun,TrecQrel, TrecEval, fusion
r1 = TrecRun("my_run1")
r2 = TrecRun("my_run2")
qrels= TrecQrel("my_qrels")
fused_run = fusion.combos([r1, r2], strategy="mnz")
# or fused_run = fusion.rank_biased_precision_fusion([r1, r2])
# or fused_run = fusion.vector_space_fusion([r1, r2])
# or fused_run = fusion.reciprocal_rank_fusion([r1, r2])
r1_p10 = TrecEval(r1, qrels).get_precision(depth=10)
r2_p10 = TrecEval(r2, qrels).get_precision(depth=10)
fused_run_p10 = TrecEval(fused_run, qrels).get_precision(depth=10)
print("P@10-- Run 1: %.3f, Run 2: %.3f, Fusion Run: %.3f" % (r1_p10, r2_p10, fused_run_p10))
fused_run.print_subset("my_fused_run.txt", topics=fused_run.topics())
Please dont forget to update trectools first. I also added a few todos on ideas of how to clean the code and make it more pandas-like. If you are up to it, please feel free to contribute.
Best,
Joao
Thanks a lot for your help. I managed to get the fused runs now. However, I have an issue with the saved runs, It seems the print_subset function have an issue with the documents IDs format when the document ID is a long sequence. Please see an example below. I think they need to be saved as strings not floats.
938526354907201539 Q0 2447462545.0 1 68.17239761352539 comb_mnz 938526354907201539 Q0 2573395934.0 2 59.64200019836426 comb_mnz 938526354907201539 Q0 8.419854421992653e+17 3 47.85719871520996 comb_mnz 938526354907201539 Q0 1.2399882500827095e+18 4 47.61159896850586 comb_mnz 938526354907201539 Q0 66183082.0 5 46.91860008239746 comb_mnz 938526354907201539 Q0 1.1214384475945533e+18 6 45.5049991607666 comb_mnz
Interesting, Fatima. I wrote a patch for this issue. Could you please check if this is fixed with the latest code? Thanks for reporting it!
Thanks for your quick response. Unfortunately I am getting the below error now.
r1_p10 = TrecEval(r1, qrels).get_precision(depth=10)
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/trectools/trec_eval.py", line 670, in get_precision
merged = pd.merge(run[["query", "docid", "score"]], qrels[["query","docid","rel"]], how="left")
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 87, in merge
validate=validate,
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 656, in __init__
self._maybe_coerce_merge_keys()
File "/ds/usr/fatima/.conda/envs/myenv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 1165, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Thanks again, Fatima. Forcing the docid to be a string will have a few consequences as the one that you described. I pushed another patch now. If you face another error, could you please send me the files by email so I can test them quickly here? Thanks!
Thanks a lot I really appreciate it. It worked perfectly fine now.
Great, glad to hear that! Thanks for using TrecTools and feel free to contribute if you would like to!
Thanks for your efforts. TrecTools is really useful for my research, and I would be happy to contribute to this great work.
Hi, I was trying to get fused runs. I managed to do it perfectly fine with reciprocal_rank_fusion with the example you showed, but when trying combos function the first issue I noticed is that it does not return a TrecRun object as reciprocal_rank_fusion do so I had to convert to a TrecRun object myself. fused_run=TrecRun(fused_run) but I am getting the below error
Kindly advise.