Open ghost opened 6 years ago
Hi @tcollins590 I guess I would personally add a "post-processing" step after applying this model to fix this type of error. If you have a file called punc.txt
with the punctuation added, then you could do something like cat punc.txt | sed 's/\([a-zA-Z]\)$/\1./g' > new_punc.txt
. This would add a period at the end of any sentence that ends with a letter (lower or upper case). It's not perfect obviously... but it would work in most cases.
Hi! I have an idea for the fix, but I'll have more free time in a few months to implement it.
The idea is to change the part in punctuator.py where the model selects the punctuation with highest probability:
p_i = np.argmax(y_t.flatten())
by adding a mask that sets the probabilities of non-end-of-sentence punctuations (plus the no-punctuation class) to zero if we have reached the end of input text:
p_i = np.argmax(y_t.flatten() * eos_mask)
This would force the model to choose between period, question mark or exclamation.
Hello @ottokart it would be great if you would manage to implement this. Thanks
First of all. Thank you for putting this project together, it's incredible and incredibly useful.
I've noticed that the last phrase or sentence in a block of text is not having punctuation added. Do you have any advice on how to fix this?
Thank you