notAI-tech / deepsegment

A sentence segmenter that actually works!
http://bpraneeth.com/projects
GNU General Public License v3.0
303 stars 57 forks source link

One sentence inside another #36

Closed JuanFF closed 4 years ago

JuanFF commented 4 years ago

Hello,

I'm going to expand a custom training set with new examples. These two sentences are in the set:

  1. Good morning I need help to fix this issue
  2. I need help to fix this issue

In this case, I need that DeepSegment keeps the boundaries of the longest one (1). Considering that both examples are in training, I wonder if the final result would be

['Good morning', 'I need help to fix this issue']

I would like to avoid this but keep both examples as training. Would this be possible after training the model?

Thanks

bedapudi6788 commented 4 years ago

For this, I would keep both the examples in training set and let the model learn the good morning (or contextually similarly phrases) should not be split when accompanied by phrases like I need help to fix this issue.

So, keep both example 1 and example 2 in training set and train the model. Based on your results (i.e: if it segments sentences like 1), you might need to add more data similar to example 1.

JuanFF commented 4 years ago

Thanks a lot!

bedapudi6788 commented 4 years ago

@JuanFF I am closing the issue for now. Feel free to re-open if required.