Closed TedTed closed 1 year ago
Thanks for reporting this. We will update the documentation and consider how to best alert people when they're trying to run a synthesizer that doesn't have the correct dependencies installed.
Regarding "why isn't it part of the automatically installed dependencies?", there is a tension between having all synthesizers work immediately out of the box versus providing only what's needed, to reduce attack surface area. This isn't to say that smartnoise-synth
's current defaults are particularly coherent or intentional, but this is an important decision we are trying to think through. Considering that these synthesizers are often run in "eyes-off" environments with strict security controls and especially sensitive data, we're leaning towards shifting the defaults to be as minimal as possible. For example, maybe pip install smartnoise-synth
installs only mwem, or even installs no synthesizers at all, and people need to say pip install smartnoise-synth[mst, aim, mwem]
to get specific synthesizers. There would be an option like, pip install smartnoise-synth[all]
which would install all synthesizers, supporting data scientists who need to compare all of the synthesizers to decide what's the best option to run in the eyes-off environment. And, of course, the security review and threat model would then be able to focus on the specific synthesizer that was selected in the evaluation phase. In the production environment, the pip install
would need to specify the selected synthesizer. Basically, making the on-boarding experience for people kicking the tires slightly more cumbersome (have to know the pip install smartnoise-synth[all]
syntax), in exchange for making it marginally more difficult for devops people deploying code in eyes-off environments to shoot themselves in the foot.
Since this is something we are still thinking through, and your team has a lot of real-world experience, your feedback is welcome.
Updated and pushed docs to docs.smartnoise.org
I tried to run the code in the "Preprocessor hints" section of this documentation page, and it fails for multiple reasons:
categorical_columns.remove(['income', 'age')
line is a syntax error]
or removing the[
, the code is still invalid — instead, one must use two different calls toremove
, one with'income'
as the sole argument and one with'age'
as the sole argument.disjoint_set
dependency, used inmst.py
, is not installed automatically when runningpip install smartnoise-synth
, so the code snippet fails withModuleNotFoundError: No module named 'disjoint_set'
.pip install disjoint-set
, it fails for a different import error:ModuleNotFoundError: No module named 'networkx'
pip install networkx
, it fails for a different reason:pip install git+https://github.com/ryan112358/private-pgm.git
) fixes the error, and I can run the code snippet. But then, why is that "please run the following comment" simply printed to stdout, as opposed to thrown as an exception, so the end-user only sees this? Furthermore, why isn't it part of the automatically installed dependencies?