Closed homedirectory closed 1 year ago
what is the easy way to disable it? (beside comments those lines..)
@barshag
you can opt out by setting the COLLECT_LEARNINGS_OPT_OUT environment variable
Created a pull request: https://github.com/AntonOsika/gpt-engineer/pull/423
Thanks for raising concern!
Especially the broken link to terms of use. Fixing asap.
The terms are very short so should be easy to read.
As for prompts being recorded: This was discussed publicly in discord #general during the weekend. Since openai is doing this already we did not see an issue.
Could anyone steelman the argument here for why a 20 page openai ToS stating that data is collected is fine but the terms of use here are not explicit enough?
Stealth addition of MITM spyware would constitute an issue under the code of conduct or in any professional setting when dealing with likely commercial in confidence; trade secrets, IP, internal processes, etc.
https://github.com/AntonOsika/gpt-engineer/blob/main/.github/CODE_OF_CONDUCT.md Examples of unacceptable behavior include: ... Other conduct which could reasonably be considered inappropriate in a professional setting
I appreciate the open discussion about this.
For the bulk of the issue I'm definitely to blame here:
We wrote a terms of use, explaining the data collection, and linked to it from the README when performing the telemetry update.
I rushed getting this out, and did not add the terms of use to version control.
We do not have any "automatic tests" for broken links, as we have for code, and it was merged before I caught it.
Many parts of the negative reaction, apart for my huge blunder with the broken link (reaction on this is warranted) stand out to me as overly polarised against what is pretty standard product analytics. Two main reasons:
A core part of the application is that the user is explicitly prompted to share if the code worked or not for "learning" purposes. Based on that this is the user flow, it should be expected that it is stored.
A lot of care is put into not sending or storing any private date (IP, use agent, similar). Very happy to have this fact audited.
I am committed to doing what is best for the community here. This means striking the right balance of not invading privacy and building a useful tool. Without getting feedback, per default, on how well this tool works for users it is very difficult to do a good job on improving it.
My experience before this issue was created was that: very few people are protective of sharing their prompts with external services (consider all the GPT chrome extension etc out there, where I'm many also share IP, fingerprint, etc).
Appreciate your contribution @Gamekiller48 and everyone. I know everyone here wants what is best for the users.
I will merge your PR @Gamekiller48 (opt out -> opt in PR).
If someone, in addition, could make a PR to make the "CLI review flow" ask "is OK to send data", that would be great. (I'm super busy at the moment and not able to write code, would look into it myself otherwise)
Furthermore, as a follow-up here, I will ensure there is further review on what is the right policy for an application like this, and that there is an informed decision with input from experts and those with opinions from different sides.
I will post in this issue again with the final conclusions, including if we decide to change the stance on data collection. In this way everyone subscribed to this issue will become notified.
Personal note Some comments, such as "stealth addition" (the data collection updates were announced) and "MITM spyware" weigh on me and take much of the joy out of trying to measure the capability and continuously improve the project.
I ask from the community to get support in building something useful and open source, and get constructive contributions. Such as PRs to address concerns and improve the tool for everyone (some in this thread are role models here).
When using the OpenAI models through the API (as this project does), OpenAI explicitly does not collect user data to further train their models: https://openai.com/policies/api-data-usage-policies
OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose.
Though they do retain data for 30 days in case of abuse and misuse, which is completely different from retaining data to improve the service:
Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).
I don't know if I would call it sneaky. I mean this repo has had so many things changed in the past 30 days, it is easy for a broken link to get overlooked.
GPT Engineer collects user data, namely user prompts among other metadata. This fact is not mentioned in the README, nor is it mentioned that you can opt out by setting the
COLLECT_LEARNINGS_OPT_OUT
environment variable. The ToS link in the README is broken. Therefore, the only way for users to be aware of their data being sneakily collected is to read the code.I consider this to be a violation of users' privacy, and propose one of the following to be implemented as soon as possible:
https://github.com/AntonOsika/gpt-engineer/blob/0596b07a39c2c99c46509c17660f5c8aef4b2114/gpt_engineer/collect.py#L25