Open juju4 opened 1 year ago
Regarding data classification, can you explain more what you mean? It might be possible if the classification is in the metadata, but I'm not sure how do to that efficiently in any other situation.
I'll look at the tools you mentioned, especially the yelp one as it is already a python module. If you're already working on a module, please le tme know so I don't reinvent the wheel.
Just a note regarding the LLM part and generally sharing with 3rd party: I'd not trust anything automated to properly detect PII/secrets before sending them to a 3rd party blackbox, so this is never going to be supported officially by pandora. A human will always have to take the responsibility for that kind of behaviors.
I'm not looking to remove human from decision, just try to help them make it. Idea was if you have an internal pandora instance where in best case, people get used to submit their office files, having at same place a reminder that the file/content has a classification banner or file metadata or is identified with sensitive data would be a nice helper.
The classification identification outside of metadata would just be a text pattern match with some example scales (BAIL/BAF from https://help.libreoffice.org/latest/en-US/text/shared/guide/classification.html and TLP from https://www.first.org/tlp/) that could be customized to match internal naming.
Not working on a module.
It would be good if pandora could
Note: this could be useful for both file and text input. For example, user could use the internal pandora to validate a text before sending to an external llm as prompt or online tool/spell/translate/whatever