notAI-tech / deepsegment

A sentence segmenter that actually works!
http://bpraneeth.com/projects
GNU General Public License v3.0
302 stars 56 forks source link

Deepsegment does not segment on custom model #31

Closed BoneGoat closed 4 years ago

BoneGoat commented 4 years ago

Describe the bug and error messages (if any) I trained Deepsegment on 1GB of custom data in Swedish. All was successful but when I run inference the model does not segment the text.

**The code snippet which gave this error***

for line in lines[:3]:
    print(line)
print('Tot: {}'.format(len(lines)))
--------------------------------
Enligt ett pressmeddelande från Anza är Hamilton Acorn Englands ledande producent av professionella måleriverktyg.
Omsättningen är cirka 150 miljoner kronor och företaget har 136 anställda.
Det var ett paket med flera kilo hasch som hittades av tullen på Landvetters flygplats utanför Göteborg.
Tot: 10126568
--------------------------------
x, y = generate_data(lines[10000:], max_sents_per_example=6, n_examples=10000)
vx, vy = generate_data(lines[:10000], max_sents_per_example=6, n_examples=1000)
--------------------------------
100% (10000 of 10000) |##################| Elapsed Time: 0:00:01 Time:  0:00:01
100% (1000 of 1000) |####################| Elapsed Time: 0:00:00 Time:  0:00:00
--------------------------------
train(x, y, vx, vy, epochs=2, batch_size=64, save_folder='./', glove_path='cc.sv.100.vec')
--------------------------------
Epoch 1/2
157/157 [==============================] - 168s 1s/step - loss: 3.6761
 - f1: 80.93
             precision    recall  f1-score   support

       sent       0.97      0.69      0.81      3425

avg / total       0.97      0.69      0.81      3425

Epoch 00001: f1 improved from -inf to 0.80926, saving model to ./checkpoint
Epoch 2/2
157/157 [==============================] - 166s 1s/step - loss: 3.5520
 - f1: 84.49
             precision    recall  f1-score   support

       sent       0.97      0.75      0.84      3425

avg / total       0.97      0.75      0.84      3425

Epoch 00002: f1 improved from 0.80926 to 0.84494, saving model to ./checkpoint
--------------------------------
from deepsegment import DeepSegment
segmenter = DeepSegment(lang_code=None, checkpoint_path='checkpoint', params_path='params', utils_path='utils', tf_serving=False, checkpoint_name=None)
segmenter.segment('under natten har det varit inbrott i ett kontor vid bredåkra kyrka en person gripen misstänkt för inbrottet polisen skriver på sin facebooksida att en av deras hundförare lyckades spåra upp gärningsmannen och det tillgripna godset personen som är i trettiofemårsåldern greps och sitter nu anhållen ingrid elfstråhle p fyra blekinge')
--------------------------------
['under natten har det varit inbrott i ett kontor vid bredåkra kyrka en person gripen misstänkt för inbrottet polisen skriver på sin facebooksida att en av deras hundförare lyckades spåra upp gärningsmannen och det tillgripna godset personen som är i trettiofemårsåldern greps och sitter nu anhållen ingrid elfstråhle p fyra blekinge']

cc.sv.100.vec is Facebook fasttext 300 vec reduced to 100 in Swedish.

Specify versions of the following libraries

  1. deepsegment / latest
  2. tensorflow / 1.15.2
  3. keras / 2.3.1

Expected behavior I expected Deepsegment to segment the text.

Screenshots Nope

bedapudi6788 commented 4 years ago

@BoneGoat I think you should increase your n_examples a lot. n_examples is basically the amount of data the model will be trained on. 10000 examples is way too less for a new language. Similarly, you should also increase your vx, vy n_examples.

bedapudi6788 commented 4 years ago

Also, I suggest to keep epochs=15. 2 is way too less.

BoneGoat commented 4 years ago

Thank you for the quick response! I have increased n_examples with a couple of zeros and bumped epochs to 15. One epoch will now take around 8hrs. I have tensorflow-gpu installed but the training isn't using the GPU. Is there a way to utilise the GPU for faster training?

bedapudi6788 commented 4 years ago

That is odd. If tensorflow-gpu is available, it should use gpu for training. Make sure your tensorflow import is using GPU (https://stackoverflow.com/questions/38559755/how-to-get-current-available-gpus-in-tensorflow, https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices).

You might also want to increase the batch size by a lot.

BoneGoat commented 4 years ago

My setup was broken so it wasn't using the GPU. Thanks for your help!