segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
758 stars 44 forks source link

Unable to use own trained onnx models #71

Closed synweap15 closed 2 years ago

synweap15 commented 2 years ago

Hello and first of all: thank you for a great library!

I've tried to train my own model using an unusual input data format following the train Python notebook you've provided. However, after the training, when trying to load the custom model via NNSplit.load("en/model.onnx") call in python bindings, I get this:

nnsplit.ResourceError: model not found: "en/model.onnx"

I may be wrong, but it seems the current logic of model_loader.rs does not allow custom local paths, only the ones that are listed in the models.csv:

https://github.com/bminixhofer/nnsplit/blob/a5a15815382029bf5c3438fd4753f644847d4dbf/nnsplit/src/model_loader.rs#L59

Effectively limiting the available models to the pretrained ones.

bminixhofer commented 2 years ago

Hi, can you try NNSplit("en/model.onnx")? .load is like .from_pretrained in huggingface/transformers i.e. it only loads the "official" models.

synweap15 commented 2 years ago

:facepalm: That was it! I can confirm it works with NNSplit("en/model.onnx"). I've missed that there's no .load call in the examples.

Thank you!