Closed rbroc closed 1 month ago
Additional comments (from #11, which I am closing because it is kind of a duplicate). We might also make sure punctuation is used sensibly, and that there are no weird prefixes or other features that may cause artefacts when comparing model- & human-generated text. Or at least know if weird stuff is there, so we can exclude TextDescriptives features that might fit to those artifacts.
we have done this to the extent possible e.g., #73
there's some dataset-specific stuff like " < newline > " annotations in WritingPrompts which we may want to standardize and remove before fitting predictive models at scale (this should not affect median distances between human and LLM completions used for prompt selection, but we may also later want to recompute these medians to provide "cleaner" absolute values in the paper)