mailgun / talon

Apache License 2.0
1.27k stars 285 forks source link

Example format of training data #142

Open EdenHazard10 opened 7 years ago

EdenHazard10 commented 7 years ago

From issue #72, I can understand that the raw dataset in text format is not provided because you need to remove sensitive personal information.

However, can you please provide an example on how to annotate the dataset?

In the original paper I can see it is created as follows,

<other> ...
<reply>...
<sig>...

But in the forge dataset example, only the signatures are annotated. Is this deliberate or does the dataset needs to evolve to include more examples for reply lines?

Also, is there any plan to expand forge dataset further and include it in a friendly license such as Apache/MIT?