Describe the bug
I use presidio in Snowflake as UDF function.
And when I check some text field, it works good.
But when I try to work with URLs I get the exceptions
Obviously it's because of the length of the field. But what can I do with this?
Thanks
The exception
Traceback (most recent call last): File "_udf_code.py", line 17, in udf_presidio_analyze_vectorized File "/usr/lib/python_udf/4e6cf66f4e148e622ef963913d834ae5f1ae04ce85210f8d3381273c3deecdc8/lib/python3.11/site-packages/pandas/core/frame.py", line 832, in __init__ data = list(data) ^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/batch_analyzer_engine.py", line 116, in analyze_dict results: List[List[RecognizerResult]] = self.analyze_iterator( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/batch_analyzer_engine.py", line 53, in analyze_iterator results = self.analyzer_engine.analyze( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/analyzer_engine.py", line 207, in analyze current_results = recognizer.analyze( ^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/pattern_recognizer.py", line 97, in analyze pattern_result = self.__analyze_patterns(text, regex_flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/pattern_recognizer.py", line 210, in __analyze_patterns validation_result = self.validate_result(current_match) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/predefined_recognizers/crypto_recognizer.py", line 62, in validate_result bcbytes = self.__decode_base58(pattern_text, 25) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/predefined_recognizers/crypto_recognizer.py", line 79, in __decode_base58 return n.to_bytes(length, "big") ^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: int too big to convert in function UDF_PRESIDIO_ANALYZE_VECTORIZED with handler udf_presidio_analyze_vectorized
Describe the bug I use presidio in Snowflake as UDF function.
And when I check some text field, it works good. But when I try to work with URLs I get the exceptions Obviously it's because of the length of the field. But what can I do with this? Thanks
The exception
Traceback (most recent call last): File "_udf_code.py", line 17, in udf_presidio_analyze_vectorized File "/usr/lib/python_udf/4e6cf66f4e148e622ef963913d834ae5f1ae04ce85210f8d3381273c3deecdc8/lib/python3.11/site-packages/pandas/core/frame.py", line 832, in __init__ data = list(data) ^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/batch_analyzer_engine.py", line 116, in analyze_dict results: List[List[RecognizerResult]] = self.analyze_iterator( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/batch_analyzer_engine.py", line 53, in analyze_iterator results = self.analyzer_engine.analyze( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/analyzer_engine.py", line 207, in analyze current_results = recognizer.analyze( ^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/pattern_recognizer.py", line 97, in analyze pattern_result = self.__analyze_patterns(text, regex_flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/pattern_recognizer.py", line 210, in __analyze_patterns validation_result = self.validate_result(current_match) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/predefined_recognizers/crypto_recognizer.py", line 62, in validate_result bcbytes = self.__decode_base58(pattern_text, 25) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/udf/340867189/presidio_analyzer-2.2.354-py3-none-any.whl/presidio_analyzer/predefined_recognizers/crypto_recognizer.py", line 79, in __decode_base58 return n.to_bytes(length, "big") ^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: int too big to convert in function UDF_PRESIDIO_ANALYZE_VECTORIZED with handler udf_presidio_analyze_vectorized