microsoft / CodeT

MIT License
599 stars 76 forks source link

File Not Found Error #26

Closed jatinarora15 closed 6 months ago

jatinarora15 commented 7 months ago

While running run_pipeline.py I'm encountering the below error:

Traceback (most recent call last):
  File "run_pipeline.py", line 66, in <module>
    run_RG1_and_oracle_method(CONSTANTS.api_benchmark, repos, window_sizes, slice_sizes)
  File "run_pipeline.py", line 28, in run_RG1_and_oracle_method
    CodeSearchWrapper('one-gram', benchmark, repos, window_sizes, slice_sizes).search_baseline_and_ground()
  File "/local/CodeT/RepoCoder/search_code.py", line 119, in search_baseline_and_ground
    self._run_parallel(query_line_path_temp)
  File "/local/CodeT/RepoCoder/search_code.py", line 105, in _run_parallel
    repo_embedding_lines = Tools.load_pickle(repo_embedding_path)
  File "/local/CodeT/RepoCoder/utils.py", line 118, in load_pickle
    with open(fname, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'cache/vector/repos/huggingface_diffusers_ws20_slice2.one-gram.pkl'

Additional Info:

huggingface_diffusers_ws20_slice2.one-gram.pkl file is created in cache/vector/random_api directory however, there is nothing in cache/vector/repo Also, there is huggingface_diffusers_ws20_slice2.pkl file in cache/window/repo/

Can someone please help with this issue.

zfj1998 commented 6 months ago

cache/vector/repo is supported to contain embedding vectors of repo windows. In run_pipeline.py, one has to manually call the https://github.com/microsoft/CodeT/blob/35f54d60b152cc31d134b788e702878ad613d9f7/RepoCoder/build_vector.py#L46 before starting the search process.

zfj1998 commented 5 months ago

The cache files under "vector/repos/" are vectorized code fragments built from the code files within a repo. During code search, the vectorized code fragments are used for similarity comparison. So before running RG1 or repocoder method, we need to vectorize the code windows produced from each repo by calling the vectorize_repo_windows() function. I have updated the code in this commit (https://github.com/microsoft/CodeT/commit/6a6ef6359a3587e134d8350c3eebb0e639e7789a). Sorry for the inconvenience.