Example format of training data

From issue #72, I can understand that the raw dataset in text format is not provided because you need to remove sensitive personal information.

However, can you please provide an example on how to annotate the dataset?

In the original paper I can see it is created as follows,

<other> ...
<reply>...
<sig>...

But in the forge dataset example, only the signatures are annotated. Is this deliberate or does the dataset needs to evolve to include more examples for reply lines?

Also, is there any plan to expand forge dataset further and include it in a friendly license such as Apache/MIT?

mailgun / talon

Example format of training data #142