usc-isi-i2 / rltk

Record Linkage ToolKit (Find and link entities)
MIT License
107 stars 23 forks source link

Create 5 test datasets from ULAN #11

Closed szeke closed 6 years ago

szeke commented 7 years ago

The smallest dataset should contain 10,000 records and the largest should the full datasets. The remaining 3 should be somewhere in the middle.