Closed tyu008 closed 1 month ago
Sorry the last commit enabled truncation by default, however, the current code doesn't support truncation for benchmark datasets (like TRECCOVID) where the documents have a title and a text. For the experiments in the paper, we run TRECCOID without truncation. Nevertheless, it should work when you disable truncation by running:
python3 run_chunked_eval.py --task-name TRECCOVIDChunked --truncate-max-length 0
I will also make a PR ready to fix it.
Ok we merged this one:
https://github.com/jina-ai/late-chunking/pull/19
Now it should also work without the additional truncate-max-length
argument
Thanks for your quick response! It works now @guenthermi
Another issue comes when I run "python3 run_chunked_eval.py --task-name SciFactChunked --truncate-max-length 0"
" File "/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 254, in _evaluate_monolingual max_k = int(max(k_values) / max_chunks)
ZeroDivisionError: division by zero"
@guenthermi could you help check it? Thanks
Ok without --truncate-max-length 0
is should have worked fine on the last version, now I made a small commit to make truncate_max_length=0 equivalent to truncate_max_length=None. It seems like in some cases it really truncated to 0 tokens instead of disabling it.
I run the command "python3 run_chunked_eval.py --task-name TRECCOVIDChunked". Got the below errors, any suggestions?
ERROR:mteb.evaluation.MTEB:Error while evaluating TRECCOVIDChunked: Currently truncation is only implemented for documents without titles Traceback (most recent call last): File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 167, in
main()
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, **kwargs)
File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 129, in main
evaluation.run(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 422, in run
raise e
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 383, in run
results, tick, tock = self._run_eval(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 260, in _run_eval
results = task.evaluate(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 95, in evaluate
scores[hf_subset] = self._evaluate_monolingual(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 162, in _evaluate_monolingual
corpus = self._truncate_documents(corpus)
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 110, in _truncate_documents
raise NotImplementedError(
NotImplementedError: Currently truncation is only implemented for documents without titles