wavetermdev / waveterm

An open-source, cross-platform terminal for seamless workflows
https://www.waveterm.dev
Apache License 2.0
4.46k stars 116 forks source link

Request for User Control Over Conversation Data for AI #897

Closed C0sm0cats closed 1 month ago

C0sm0cats commented 1 month ago

Dear Waveterm Team,

I hope this message finds you well. I would like to raise a concern regarding user privacy and data management within the Waveterm application.

As it stands, when using the AI functionality, our messages are forwarded to the gpt-4o-mini model via your servers. While I appreciate the innovative features of Waveterm, I am concerned about the handling of our conversations and the lack of user control over this data.

In particular, I would like to request the following features to enhance user privacy:

Ability to Delete Conversations: Users should have the capability to delete their conversation history. This would give users greater control over their personal data and enhance their confidence in using the application.

Transparency in Data Handling: Clear information on how the data is stored, retained, and processed by Waveterm would be beneficial. This includes how long conversations are kept and whether they are anonymized.

User Consent for Data Usage: Implement a feature where users can explicitly consent to or opt-out of data retention practices.

I believe that by implementing these features, Waveterm can strengthen its commitment to user privacy and foster a more trustworthy relationship with its users.

Thank you for considering this request. I look forward to your response and hope to see enhancements that prioritize user control and privacy in future updates.

Best regards, cosmo

sawka commented 1 month ago

Thanks for posting this question, happy to respond. Will likely move parts of this response to the docs site as well so there is a more permanent place for this information to be stored.

First, using our proxied AI handler is completion optional. We proxy requests to gpt-4o-mini so there is an "out-of-the-box" working AI block. We strongly believe in data privacy and letting users control their own data. That's why we offer configuration options to allow you to completely bypass our AI proxy and send request to chat gpt directly (using your own API token), or to any other API compatible chat gpt interface. This includes local models like Ollama (which I suggest you use instead if you are very concerned about privacy).

When you use your own AI baseurl, or own API key, no data is ever sent to our servers. Any chat history is only stored locally on your machine (and is viewable in the "AI" block directly). To delete that chat history you just need to close the AI block and that data will be removed (again though, it is only stored locally).

For conversations that we proxy, we are not storing those conversations (requests or responses) in a database or using it for additional training. However, conversation details might be stored temporarily in our logs for debugging and/or abuse purposes. I use the word "might" here because this isn't data I want to have on my servers... but if there is a bug, or if we need to inspect some type of abuse of our free service, we may have to log and analyze these details in order to fix issues.

Now, in terms of anonymization, everything is anonymous. When API requests are made, they will include your clientid (a unique UUID that is generated at install time). We don't tie that UUID to any personal data, so as far as we are concerned it is anonymous. We do need that client id to be passed in order to enforce a rate limit on the free service (but again, you are free to use your own API key and own OpenAI base url in which case we get nothing).

tl;dr

We don't store your conversations in a database and we do not train on them. We don't store them beyond a limited window potentially for debugging and/or abuse detection. All users are free to completely opt-out of our AI service by specifying their own AI base url and/or API key to have their requests forwarded to directly to their provider of choice (even local LLMs) completely bypassing our service.

sawka commented 1 month ago

let me know if this answers your question. if you have additional follow-up questions, or need more clarification, please let me know. Also here is the link to the configuration options -- https://docs.waveterm.dev/config

C0sm0cats commented 1 month ago

Thank you for your detailed response, Sawka! I really appreciate the transparency and the effort you’ve put into clarifying Waveterm’s data handling practices. The ability to configure our own AI base URL or use local models like Ollama is a great feature, and I’m glad to see that user control over data privacy is a priority.

The explanation about temporary logging for debugging purposes makes sense, and it’s reassuring to know that conversations are not stored in a database or used for training. I’ll explore the configuration options more deeply based on the link you shared.

If I could suggest one enhancement, it would be to have a simple toggle or clearer UI indicator to easily switch between the default AI handler and a custom setup. This might help users who aren’t as familiar with the technical aspects to feel more confident in managing their data.

Again, thanks for addressing my concerns so thoroughly. I’m looking forward to seeing these privacy features evolve!

Best, Cosmo

esimkowitz commented 1 month ago

Ah that's a good idea, can you open a new issue for the toggle? I'm going to shift this issue over to a discussion