microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.88k stars 578 forks source link

Syntax Errors using the analyzer as Python package #237

Closed LaraSchvartzman closed 5 years ago

LaraSchvartzman commented 5 years ago

Using the analyzer as a Python package I encountered a few errors, here I will describe how to reproduce them and how to (temporarily) fix them in order to get the next errors: I first succesfully followed the instructions on the tutorial on how to install the presidio-analyzer as a Python package by creating a wheel file, after running into some of the issues I simply kept on manipulating the scripts (as I will describe in full detail) and running the test script (step 5 of the installation https://github.com/microsoft/presidio/blob/master/docs/install.md) in the same directory as the analyzer folder (for the script to recognize it as a module).

  1. In the directory presidio-analyzer/analyzer/recognizer_result.py the init is defined def init(self, entity_type, start, end, score, analysis_explanation: AnalysisExplanation = None): So I get a SyntaxError in “analysis_explanation: AnalysisExplanation = None”. Exact error: File "/.../presidio-analyzer/analyzer/recognizer_result.py", line 7 analysis_explanation: AnalysisExplanation = None): ^ SyntaxError: invalid syntax I fixed it temporarily by deleting “: AnalysisExplanation = None” (I don’t believe this is the correct fix because it makes de “from . import AnalysisExplanation” line superfluous), so I got the next errors.
  2. In the directory presidio-analyzer/analyzer/predefined_recognizers/iban_patterns.py I get a SyntaxError related to the patterns for the IBAN. Exact error: File "/.../presidio-analyzer/analyzer/predefined_recognizers/iban_patterns.py", line 39 SyntaxError: Non-ASCII character '\xc2' in file /.../presidio-analyzer/analyzer/predefined_recognizers/iban_patterns.py on line 39, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details I deleted the line 7 of the predefined_recognizers init that imports IbanRecognizer and deleted all references for IbanRecognizer in the file recognizer_registry.py (in the import and the load_predefined_recognizers method). Basically, I dropped the use of the iban_recognizer.py
omri374 commented 5 years ago

Hi @LaraSchvartzman, thanks for reaching out! Could you please provide some details on your environment? specifically, Python version and OS. Thanks

omri374 commented 5 years ago

At least for item (1), Please make sure you use Python > 3.5 as the analysis_explanation: AnalysisExplanation syntax was introduced in Python 3.5.

omri374 commented 5 years ago

Closing this for now. Feel free to reopen in case this issue arises.