xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.87k stars 135 forks source link

Modified MTEB Install fails - After rectification it leads to OSError #78

Closed ashokrajab closed 11 months ago

ashokrajab commented 1 year ago

I'm trying to evaluate the instructor model. When following the readMe.md to install the modified MTEB package, an OSError is thrown.

Steps to reproduce: Install InstructorEmbedding as mentioned under installation.

Then following the steps under MTEB installation leads to OSError: [Errno 2] No such file or directory: '/tmp/tmpni4r6sx4/output.json as mentioned in the issue https://github.com/HKUNLP/instructor-embedding/issues/20

I guess this might be the setup.py under MTEB https://github.com/HKUNLP/instructor-embedding/blob/main/evaluation/MTEB/setup.py#L42C1-L42C1 Just exits without calling setup()

from setuptools import find_packages, setup print(find_packages()) exit(0)

So I removed the exit(0) statement and tried pip install -e . and it successfully installed.

cd evaluation/MTEB pip install -e . python examples/evaluate_model.py --model_name hkunlp/instructor-large --output_dir outputs --task_name ArguAna --result_file results

then i tried to run the benchmark, but I encounter another error. The trace is mentioned below.

Trace:

2023-08-13 23:37:53.178162 >>> ArguAna Traceback (most recent call last): File "/instructor-embedding/evaluation/MTEB/mteb/evaluation/MTEB.py", line 240, in run results = task.evaluate(model, split, kwargs) File "/instructor-embedding/evaluation/MTEB/mteb/abstasks/AbsTaskRetrieval.py", line 660, in evaluate results = retriever.retrieve(corpus, queries) File "/beir/beir/retrieval/evaluation.py", line 20, in retrieve return self.retriever.search(corpus, queries, self.top_k, self.score_function, kwargs) File "/beir/beir/retrieval/search/dense/exact_search_multi_gpu.py", line 148, in search cos_scores_top_k_values, cos_scores_top_k_idx = metric.compute() File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/evaluate/module.py", line 433, in compute self._finalize() File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/evaluate/module.py", line 390, in _finalize self.data = Dataset(**reader.read_files([{"filename": f} for f in file_paths])) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 265, in read_files pa_table = self._read_files(files, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 200, in _read_files pa_table: Table = self._get_table_from_filename(f_dict, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 336, in _get_table_from_filename table = ArrowReader.read_table(filename, in_memory=in_memory) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/arrow_reader.py", line 357, in read_table return table_cls.from_file(filename) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/table.py", line 1059, in from_file table = _memory_mapped_arrow_table_from_file(filename) File "/home/ashok/miniconda3/envs/instructor_working/lib/python3.7/site-packages/datasets/table.py", line 66, in _memory_mapped_arrow_table_from_file pa_table = opened_stream.read_all() File "pyarrow/ipc.pxi", line 750, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: Expected to be able to read 1226936 bytes for message body, got 1226928

The modified MTEB package installation seems to be broken. I request @hongjin-su or @Harry-hash to check this out.

hongjin-su commented 1 year ago

You may check the folder permission of /tmp.

hongjin-su commented 11 months ago

Feel free to re-open the issue if you have any questions or comments!