True-casing support? - Githubissues

oliverguhr / fullstop-deep-punctuation-prediction

A model that predicts the punctuation of English, Italian, French and German texts.

https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large

MIT License

72 stars 13 forks source link

True-casing support? #19

Open silvioprog opened 7 months ago

silvioprog commented 7 months ago

Hey.

Thanks a lot for this project!

So, I have a question. Do we have true-casing (capitalization) support in fullstop-deep-punctuation-prediction? If not, what do you recommend to solve it?

TIA for any help!

oliverguhr commented 7 months ago

Hey @silvioprog, the project has no true-casing support. This should be relatively simple to implement, and I am optimistic that the models can handle this quite well. I have two ideas how this feature could be implemented: A) You could add a second classification head, that classifies if the token starts in upper case or not. B) Or you can change the classification head to multi label classification and add an "upper case" class.

Best, Oliver