removed <newlines> from stories data (but #32 is kept open for now as we may need some more cleaning)
removed whitespace between dot and final word in dailymail_cnn in both source and completions (to ensure cleaner prompts for models, although some rows are weirdly formatted semantically, see #44)
inspect all data to ensure that the data is sensible to be passed along to the models via prompts
Clean and inspect human data
<newlines>
from stories data (but #32 is kept open for now as we may need some more cleaning)dailymail_cnn
in both source and completions (to ensure cleaner prompts for models, although some rows are weirdly formatted semantically, see #44)