Open johnml1135 opened 11 months ago
We should likely just use the huggingface implementation from 2022 (see links above) - but it may need to be modified. A few reasons:
I added preliminary support for using HF constrained beam search to silnlp. From the experiments I have run, it doesn't work very well.
@ddaspit - do you know why the tests didn't go that well? Do you have the results documented somewhere? Is it "keyterms with asterisks don't work well" or "their algorithm is poor" or "certain languages don't do well with this"? Is it worth more research now, or do we want someone else to lead the charge? Do we need to add alignment data to enhance it? LILT appears to have been able to get this working well enough to integrate into their main offering - so I am inclined to believe it is possible to have it be advantageous.
There definitely seems to be something wrong with the implementation in HF. Here is an issue that describes the problems I was seeing.
While not implemented, this may do better than the current hugging face implementation: https://arxiv.org/pdf/2112.08726.pdf - with this code: https://github.com/GXimingLu/a_star_neurologic.
From the papers out there, determine the best path forward, research and implement guided decoding. Assess the improved Bleu score and user assessment of quality. Address concerns with different types of languages with preixes and suffixes on proper names and key terms.