Open StefanKarlsson321 opened 1 month ago
Hey @StefanKarlsson321 , Thanks for submitting the bug. This scanner doesn't use Presidio. It relies on the https://github.com/bridgecrewio/detect-secrets
I see that even this library is not doing a good jon on your prompt
Hmm, but I get this. It seems to point to Presidio:
Exception has occurred: InvalidParamException
Invalid analyzer result, start: -1 and end: 511, while text length is only 5.
File "/home/***/***/test_llm_guard.py", line 7, in sanitize
sanitized_prompt = scan_prompt([Secrets], prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dhb/pythoncod/test_llm_guard.py", line 13, in <module>
sanitize("Hello")
presidio_anonymizer.entities.invalid_exception.InvalidParamException: Invalid analyzer result, start: -1 and end: 511, while text length is only 5.
I see now but this issue is a bit different. We rely on the text replacer from the Presidio library, and apparently we didn't refresh credentials each time we run scanning. I fixed that issue in the latest commit.
I added an Anonymizer after the secrets scanner, and got this from presidio_analyzer with the otherwise same code:
OverflowError: int too big to convert
Will check and come back with updated presidio_analyser as well.
When I update the presidio_analyzer to the very latest origin/main, then it seem that the issue is gone.
Thanks a lot!
Describe the bug InvalidParamException exception thrown since presidio analyzer is in bad state
To Reproduce Steps to reproduce the behavior:
Expected behavior No exception thrown and no secrets detected in the "Hello" prompt.
Screenshots raise InvalidParamException(err_msg) presidio_anonymizer.entities.invalid_exception.InvalidParamException: Invalid analyzer result, start: -1 and end: 63, while text length is only 5.
Additional context Inspiration to finding bug: https://github.com/microsoft/presidio/issues/1376 Although I have applied the fix related to the bug: https://github.com/microsoft/presidio/pull/1377, the issue still occurs.