openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
MIT License
1.93k stars 548 forks source link

Simplified English often falsely classified as AI output #40

Open beltoforion opened 1 year ago

beltoforion commented 1 year ago

This is more feedback than a bug report. So feel free to close or ignore the issue. It appears that there are lots of false positives for articles in the simplified wikipedia.

Examples:

Both articles predate GPT. They cannot be AI generations yet the system is 99% sure. I found more examples. It appears that simplified english is classified as AI output with a relativly high probability.