ocdevel / gnothi

Gnothi is an open-source AI journal and toolkit for self-discovery. If you're interested in getting involved, we'd love to hear from you.
https://gnothiai.com
GNU Affero General Public License v3.0
174 stars 19 forks source link

Add statement of content visibility/access to disclaimer/privacy/TOS #190

Open ryanstraight opened 7 months ago

ryanstraight commented 7 months ago

Really enjoying the platform! I plan on mentioning it in a conference presentation next week in a talk about AI and education. However, knowing that I'll be asked, I'm struggling to find a good place to point to in the user-facing documentation that clearly states what of and to what extent the (likely very personal and private) journal entry content is accessible or not to the devs or third parties. Is this an update that could be made? I see mentions of "snooping" in some places but there's no clarity on what that is, how it's used, who can do it, or if it's available in the current release.

lefnire commented 7 months ago

Thanks for pinging! Snooping is something else, it's part of the Sharing feature (which was pulled for the v2 launch, I'll add it back later) which allows you to "log in as" a person who has explicitly shared things with you. In which case you can see only what that user has shared with you (journal entries, behaviors, book recommendations, etc) but while "viewing as" that user. It's a feature primarily meant for sharing certain things with your therapists; but which has value too for you and your partner, friends, etc.

I'll try to get to your request as soon as I can. Alas that it's 1-man-band nights & weekends, and I'm currently swamped with some other projects. But let me give you the skinny here.

  1. OpenAI is used for the AI stuff currently. OpenAI doesn't train on user data, but they retain it for a short time for abuse investigations (which means them coming after me, rather than the user); then they delete it if no legal alarms went off. This is the only 3rd party worth caring about. While I'm comfortable with my own journal with that info regarding OpenAI, I know some might not be so I plan to add Gnothi-hosted models (Mixtral, etc) and user-hosted models (user with tech chops can setup a localhost LLM, for extra privacy considerations).
  2. Developer access. Everything's encrypted (rest & transit) and behind a VPC (AWS). I have the capacity to get in, in emergency scenarios, using something called a Bastion Host. I've done this once or twice when something was broken for a user's account.

Here's a dump of a similar conversation I had with someone previously:

Yeah, so let me give you a TL;DR on the security / privacy situation. 
OpenAI API does not train on data. ChatGPT (the website) does, and use with caution; but the API they take their hands off so any company using it can feel comfortable not handing over data. So we're good here. Even so, I'm looking into different model options in the future: (1) OpenAI; (2) gnothi-hosted (open source models); (2) local model (DIY - you set it up on your own home computer).
Security. Gnothi had a first pass where I used Python/Node modules to home-grow security solutions in 2018. I spent 2020-2023 switching to all AWS-managed security and encryption so that there's no room for developer error. It was a move towards something called "serverless", which means I don't manage any of the security / privacy considerations.
Encryption. Everything's encrypted "at rest", meaning if someone got ahold of the database, it would be jibberish. Except 1 single loophole:
Tyler access. When I need to do database operations (like what I'm going to do for this check), I can setup this tunnel, effectively putting me into the AWS environment. It's actually hard to do, with multiple keys / steps, which is why I haven't done it yet; so I save it for emergencies. However, at that point from my computer the DB is decrypted. it's not uncommon for things like this, but with HIPAA stuff (which Gnothi counts as) it needs to be audited and documented with reason. Luckily AWS handles a lot of the auditing for me.
4a. Nonetheless, you're correct - over time I should be adding administrative tooling for my account into the website for me to perform common scenarios without needing direct DB access. I've started a few of them, but I'll continue to collect things (eg, "check user timezone with user_id <id>") 
Point 1 wasn't always the case. I had no intention of using OpenAI until they changed that part. 

Takeaway: it's not perfect, but as much as is possible I've taken myself out of the security / privacy position, and left it to the professionals. Let's just say: I take this seriously for my own journal; and take others' journals equally seriously

What I meant was:
chat.openai.com (ChatGPT the product) trains and retains all your data. I trust them well enough, but maybe I'm a fool.
OpenAI API (the service I use under the hood, hosted by the same people) does not. They don't want to be liable for anything you use them for, so they remove themselves from the equation. The model you use is a "snapshot", meaning it's frozen in time and doesn't remember anything you say, that's not part of the current conversation
Totally separate: I do train on a handful of things; currently the only two are adjusting your books preferences based on thumb up/down; and calculating the correlation scores between your behaviors (the Analyze tab). But overall, any AI training on user data is extremely minimal - dare I say negligible

Eventually I plan this to be fully HIPAA compliant, so I can have users share their journals with their therapists.

If someone writes a sad diary entry on ChatGPT (chat.openai.com), that will be remembered forever by ChatGPT. They train, and they use special tech to remember what's related to you to try to "remember you" as much as it can. This has actually been a problem in some cases where users have gotten access to other users' data. They're working out the kinks.
But OpenAI API (what Gnothi uses) remembers nothing. It's frozen in time from Dec 2023, so when you're done using it / chatting with it, the whole thing never happened 
There's some level to which OpenAI API keeps some usage on hand in case of abuse, so they can come after the company (me) if something goes legally wrong. But everything's deleted after a short period. 
lefnire commented 7 months ago

In short, I intend to be as hard-core on privacy and security as possible. I wouldn't want a dev reading my stuff; so I intend the same. I emphasize intend because I've very little resources to perfect said goal, this being a side project. But if it ever makes money, security/privacy will be the first resource focus.

ryanstraight commented 7 months ago

Fantastic, thank you! I appreciate the quick response and the transparency. I have some ideas for resources but I'll shoot an email about that. Cheers!

lefnire commented 7 months ago

That sounds wonderful! Thanks @ryanstraight