notAI-tech / deepsegment

A sentence segmenter that actually works!
http://bpraneeth.com/projects
GNU General Public License v3.0
303 stars 57 forks source link

Deepcorrect not working on model trained using Deepsegment. #18

Closed sangeet2020 closed 4 years ago

sangeet2020 commented 4 years ago

Hello,

Apologies if the headline isn't to the point. Actually I used Deepsegment (https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m#scrollTo=K9oMoDwwXgQl) to train a language model on my custom data. However when I use the trained model (HDF format) and parms (JSON format) and run the code below:

### My logic: czech-data -> DeepSegment -> train mode -> DeepCorrect -> punctuated and segmented sentences

from deepcorrect import DeepCorrect                                                                                                                                                                         
DeepCorrect('/home/sagar/.DeepSegment_cs/params', '/home/sagar/.DeepSegment_cs/checkpoint')

There is an error that

UnpicklingError: invalid load key, '{'

As far I understand correctly, deep correct expects the params file to be a pickle file and not a plain text JSON file. Is there anything wrong with my approach?

Thank You

bedapudi6788 commented 4 years ago

I think you misunderstood something somewhere. DeepSegment and DeepCorrect are two different libraries with two different model architectures. You have to train both separately and use both separately.

sangeet2020 commented 4 years ago

Could you please share the instruction to train model for deepcorrect ?

bedapudi6788 commented 4 years ago

https://github.com/bedapudi6788/txt2txt/

bedapudi6788 commented 4 years ago

@sangeet2020 I am closing the issue, feel free to re-open if you have any other questions.

sangeet2020 commented 4 years ago

Sure Mr. Praneeth. Thanks a lot. I had a question, I am trainning Deepsegment using tensorflow2.0, will this be an issue or any effect on the model ?

bedapudi6788 commented 4 years ago

will this be an issue or any effect on the model ?

No. Latest versions of both deepsegment and txt2txt support newer tf and keras versions.