Closed benbot closed 9 months ago
Can you please help me reproduce this error by sharing a little bit more information.
Also I'm curious if it only crashed on one specific repository, or if it crashes for everything
I'm getting the same issue.
M2 Max pipx installed seagoat, version 0.28.0 Tinygrad/tinygrad repo.
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
self._handle_task(context, task)
File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
result = handler(context, *task.args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results
sorted(
File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda>
+ 0.3 * normalize_file_position(top_files[x.path])
~~~~~~~~~^^^^^^^^
KeyError: 'state.py'
Printing out the top__files
:
{'tensor.py': -0.6955769938501167}
I'm getting the same issue.
M2 Max pipx installed seagoat, version 0.28.0 Tinygrad/tinygrad repo.
Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function self._handle_task(context, task) File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task result = handler(context, *task.args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query results = context["seagoat_engine"].get_results(kwargs["limit_clue"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results sorted( File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda> + 0.3 * normalize_file_position(top_files[x.path]) ~~~~~~~~~^^^^^^^^ KeyError: 'state.py'
Printing out the
top__files
:{'tensor.py': -0.6955769938501167}
regarding this, just out of curiosity, is the file state.py gitignored? Or perhaps it's a new file that has not been committed yet?
Just trying to figure out why it would not be included in top_files
as that is generated based on git history
I'm getting the same issue. M2 Max pipx installed seagoat, version 0.28.0 Tinygrad/tinygrad repo.
Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function self._handle_task(context, task) File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task result = handler(context, *task.args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query results = context["seagoat_engine"].get_results(kwargs["limit_clue"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results sorted( File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda> + 0.3 * normalize_file_position(top_files[x.path]) ~~~~~~~~~^^^^^^^^ KeyError: 'state.py'
Printing out the
top__files
:{'tensor.py': -0.6955769938501167}
regarding this, just out of curiosity, is the file state.py gitignored? Or perhaps it's a new file that has not been committed yet?
Just trying to figure out why it would not be included in
top_files
as that is generated based on git history
Repo I'm using: https://github.com/tinygrad/tinygrad
Running server in ..../tinygrad folder
The state.py is not gitignored
Just had the crash happen again in https://github.com/Oneirocom/Magick/
This time the server wasn't finished processing all the chunks (60K) but this was the same error on the other project which was finished processing everything.
Magick is a large js project and the other was a medium sized java project.
Also this time i'm on Arch Linux. So this is happening at least on Arch and macos.
File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/usr/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
self._handle_task(context, task)
File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
result = handler(context, *task.args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 208, in get_results
sorted(
File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in <lambda>
+ 0.3 * normalize_file_position(top_files[x.path])
~~~~~~~~~^^^^^^^^
KeyError: 'packages/@types/rete-connection-reroute-plugin.d.ts'
that file isn't in the .gitignore either
Hitting this on mac as well on a file which is in not in gitignore.
I am doing it one level into the folder, not from root, so there is that.
I might also got the KeyError, here is the trace:
`Analyzing source code: 0it [00:00, ?it/s]
2023-09-22 08:57:07,014 Analyzed the minimum number of chunks needed to operate.
2023-09-22 08:57:07,014 Analyzed all chunks!
2023-09-22 08:57:07,014 Handling task: query
/home/yshen/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████████████████████████████| 79.3M/79.3M [00:07<00:00, 10.6MiB/s]
Exception in thread Thread-1 (_worker_function):
Traceback (most recent call last):
File "/home/yshen/miniconda3/envs/seagoat-python311/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/home/yshen/miniconda3/envs/seagoat-python311/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, *self._kwargs)
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
self._handle_task(context, task)
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
result = handler(context, task.args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 208, in get_results
sorted(
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in
The file as the key for the KeyError is indeed a file in code base.
I just started SeaGOAT a minutes before, then I type:
> gt "sourcetypes"
and got the above error and trace.
I might need to wait for longer time, even after the server finish scanning the code base?
I'm running in Ubuntu 24.4, in WSL2/Window 11. The files complained of KeyError is not tracked by git. but in the same repo, the same error also happended with a file tracked by git, not ignored:
File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in <lambda>
+ 0.3 * normalize_file_position(top_files[x.path])
~~~~~~~~~^^^^^^^^
I'll try a different repo.
I might need to wait for longer time, even after the server finish scanning the code base?
No, that should not be necessary at all!
What is the expectation to a repo to be working with gt?
What is the expectation to a repo to be working with gt?
- Must be a git repository
- All files must be checked-in, not ignored, all committed?
It just needs to be a git repository. Even if there are no files that are actually committed, it should still work. Actually by design it even works with files that you have just recently created.
I have a suspicion that
KeyError on like 212 in engine.py on request.
has to do with 2 competing versions of the file existing somehow, or maybe the file no longer has the line that it was last analyzed with. I think that this would be solved by grouping the results by SHA1 hash and using git to retrieve the correct version of the file
I suspect this is a different error, I have only one theory for it which is maybe a result appears through ripgrep, but it is not anywhere in git history. Maybe there is a bug that files that have not been committed yet are not included in top_files, but that would only be possible if the file is not in any previous commit :thinking:
+ 0.3 * normalize_file_position(top_files[x.path])
~~~~~~~~~^^^^^^^^
KeyError: 'packages/@types/rete-connection-reroute-plugin.d.ts'
find out that error is because of x.path is lowercase but key inside top_files has uppercase symbol. I think that goes from repository class, where processed commit on files, that line
if not (self.path / filename).exists():
continue
Perhaps I renamed that file from uppercase. I'm not checked, but people say that .exists() on mac works case insensitive. So I get method from here and replace .exists() https://stackoverflow.com/questions/6710511/case-sensitive-path-comparison-in-python
Now I got same error. but that file with uppercase is not in the top_files hash anymore, but current lowercase file not in there too, but it is in results and failing here again.
temporarily fixed that error with changing to
return list(
sorted(
results_to_sort,
key=lambda x: (
0.7 * normalize_score(x.get_best_score(self.query_string))
+ 0.3 * normalize_file_position(top_files.get(Path(x.path).as_posix(), 0))
),
)
find out that error is because of x.path is lowercase but key inside top_files has uppercase symbol. I think that goes from repository class, where processed commit on files, that line
if not (self.path / filename).exists(): continue
Perhaps I renamed that file from uppercase. I'm not checked, but people say that .exists() on mac works case insensitive. So I get method from here and replace .exists() https://stackoverflow.com/questions/6710511/case-sensitive-path-comparison-in-python
Now I got same error. but that file with uppercase is not in the top_files hash anymore, but current lowercase file not in there too, but it is in results and failing here again.
I noticed that one way this error can happen is if a file is found the ripgrep before the repo was analyzed. This can happen if you create a file while the server is analyzing files, and then make a query before all files are analyzed. That is because the server is not looking for more files to analyze while there are still files in the queue.
But I'm curious if the same error can happen in other circumstances as well :thinking:
Reopening because only the error regarding files not being found was fixed, the error regarding lines not being found probably still persists
Just installed it on my work laptop (running macos)
Server usually crashes once I make a request on like 212 in engine.py complaining about a KeyError on one of the files.
I had it working one time on my 3rd try starting the server. Not sure I did anything different though
I can't post the log here unfortunately :(