higgood / med-jargon-explain-inator

Forking this so that we can associate tasks with the relevant repo. The ownership of this project belongs to all team members, and not to HIGG. HIGG is only sponsoring to facilitate project management.
2 stars 1 forks source link

Find and download the training sets for the encoder-decoder sentence simplification model #2

Open wammar opened 6 months ago

wammar commented 6 months ago

Probably with ParaPhrase DataBase(http://paraphrase.org/#/download) and with Aligned pairs between the Simple English Wikipedia entries and their corresponding English Wikipedia entries

Kauchak, D.: Improving text simplification language modeling using unsimplified text data. In: 51st Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long papers, pp. 1537–1546. ACl, Sofia, Bulgaria (2013)

Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB: the paraphrase database. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 758–764 (2013)

wammar commented 6 months ago

After discussing alternative approaches in the design doc, we decided to archive this task.