xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.78k stars 131 forks source link

Modified MTEB Install fails - After rectification it leads to OSError #78

Closed ashokrajab closed 6 months ago

ashokrajab commented 10 months ago

I'm trying to evaluate the instructor model. When following the readMe.md to install the modified MTEB package, an OSError is thrown.

Steps to reproduce: Install InstructorEmbedding as mentioned under installation.

Then following the steps under MTEB installation leads to OSError: [Errno 2] No such file or directory: '/tmp/tmpni4r6sx4/output.json as mentioned in the issue https://github.com/HKUNLP/instructor-embedding/issues/20

I guess this might be the setup.py under MTEB https://github.com/HKUNLP/instructor-embedding/blob/main/evaluation/MTEB/setup.py#L42C1-L42C1 Just exits without calling setup()

from setuptools import find_packages, setup print(find_packages()) exit(0)

So I removed the exit(0) statement and tried pip install -e . and it successfully installed.

cd evaluation/MTEB pip install -e . python examples/evaluate_model.py --model_name hkunlp/instructor-large --output_dir outputs --task_name ArguAna --result_file results

then i tried to run the benchmark, but I encounter another error. The trace is mentioned below.

Trace:

2023-08-13 23:37:53.178162 >>> ArguAna Traceback (most recent call last): File "/instructor-embedding/evaluation/MTEB/mteb/evaluation/MTEB.py", line 240, in run results = task.evaluate(model, split, kwargs) File "/instructor-embedding/evaluation/MTEB/mteb/abstasks/AbsTaskRetrieval.py", line 660, in evaluate results = retriever.retrieve(corpus, queries) File "/beir/beir/retrieval/evaluation.py", line 20, in retrieve return self.retriever.search(corpus, queries, self.top_k, self.score_function, kwargs) File "/beir/beir/retrieval/search/dense/exact_search_multi_gpu.py", line 148, in search cos_scores_top_k_values, cos_scores_top_k_idx = metric.compute() File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/evaluate/module.py", line 433, in compute self._finalize() File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/evaluate/module.py", line 390, in _finalize self.data = Dataset(**reader.read_files([{"filename": f} for f in file_paths])) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 265, in read_files pa_table = self._read_files(files, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 200, in _read_files pa_table: Table = self._get_table_from_filename(f_dict, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 336, in _get_table_from_filename table = ArrowReader.read_table(filename, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 357, in read_table return table_cls.from_file(filename) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/table.py", line 1059, in from_file table = _memory_mapped_arrow_table_from_file(filename) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/table.py", line 66, in _memory_mapped_arrow_table_from_file pa_table = opened_stream.read_all() File "pyarrow/ipc.pxi", line 750, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: Expected to be able to read 1226936 bytes for message body, got 1226928

The modified MTEB package installation seems to be broken. I request @hongjin-su or @Harry-hash to check this out.

hongjin-su commented 10 months ago

You may check the folder permission of /tmp.

hongjin-su commented 6 months ago

Feel free to re-open the issue if you have any questions or comments!