zooniverse / shakespeares_world

Full text transcription project for the Folger Shakespeare Library
https://www.shakespearesworld.org
Other
8 stars 5 forks source link

MS office XML is saved on some classifications #343

Open CKrawczyk opened 7 years ago

CKrawczyk commented 7 years ago

The attached file shows an example of one classification where a user copy and pasted a transcription from MS office into the front-end: example.txt

The XML syntax from the document even though there are only 3 words to this classification.

It would be nice if the front-end striped the unneeded tags from these types of classifications before saving.

For reference this was taken from classification ID 5757593.

eatyourgreens commented 7 years ago

If possible, we should strip tags from pasted text and just use the text content.