slavesocieties / openai

0 stars 0 forks source link

Build app to gamify generation of training data for disambiguation of people #19

Open d-genk opened 2 weeks ago

d-genk commented 2 weeks ago

Once we're generating data at scale, we'll need to be able to aggregate content extracted from a single sacramental record with other content extracted from the same volume, and all content extracted from a given volume with all extracted content, period. One of the largest challenges in this process will be identifying whether or not (or how likely it is that) two people identified at different places/in different times are in fact the same person. Ideally we will be able to automate this process, but in order to do so we will need to generate a substantial body of training data. Perhaps the best way to do that will be to gamify the process by creating a simple GUI that offers users a series of potential matches and asks for their input, then saves the input in a structured record to later be used to train an automated model.