Results in NeuralQA inconsistent with same model running on HF

jvence commented 4 years ago

I've tested a model that I've deployed on NeuralQa vs one deployed on HF and noticed that the same inputs are yielding different outputs even though it's using the exact same model. This can of course be attributed to a few things but I can't seem to identify the culprit.

Here's the context:

Question: Are your handsets locked or unlocked?

Corpus: ['No, all our handsets are unlocked.','Since your SIM isn’t working in your handset while other SIM cards are, it might be an issue with your handset provider; or the mobile phone could be locked, meaning it only accepts SIM cards from a particular service provider. Please contact the handset dealer for more assistance.']

The following returns 'unlocked' which is the correct response: See Demo on HuggingFace

I've configured the exact same model in NeuralQA (with relsnip disabled) and the result is 'locked' even though I'm feeding exactly the same inputs. Here my log:

0:No, all our handsets are unlocked. [{'answer': 'unlocked', 'took': 0.35032129287719727, 'start_probability': '0.92030567', 'end_probability': '0.00026586326', 'probability': '0.460418697912246', 'question': 'Are your handsets locked or unlocked?', 'context': 'no, all our handsets are unlocked '}] 1:Since your SIM isn’t working in your handset while other SIM cards are, it might be an issue with your handset provider; or the mobile phone could be locked, meaning it only accepts SIM cards from a particular service provider. Please contact the handset dealer for more assistance. [{'answer': 'locked', 'took': 0.5319299697875977, 'start_probability': '0.9462091', 'end_probability': '0.007203659', 'probability': '0.48030819557607174', 'question': 'Are your handsets locked or unlocked?', 'context': 'since your sim isn ’ t working in your handset while other sim cards are, it might be an issue with your handset provider ; or the mobile phone could be locked , meaning it only accepts sim cards from a particular service provider. please contact the handset dealer for more assistance'}]

As you can see the 2nd answer gets a higher probability but that doesn't really make sense as it's exactly the same model. The main difference is that the NeuralQA model is feeding the corpus content independently while in the HF example, we're feeding the entire corpus.

Any ideas on why this is happening?

jvence commented 4 years ago

Could this be related to #39

victordibia commented 4 years ago

@jvence ,

Yup, it is definitely related to #39 .The solution will be to rewrite that piece using the HF approach. Its part of some work to convert the entire lib to use pytorch. See #53 . Hoping to have some updates in the coming week or so.

jvence commented 4 years ago

Yes further testing with multiple models does confirm that the results given by NeuralQA are way off the ones returned by HF face model. Hope this can be resolved soon as it's critical to us. Thank you

jvence commented 4 years ago

Hi @victordibia, just checking in to see if there's any update on this? Seems like a pretty critical issue. Thanks

jvence commented 3 years ago

@victordibia Is this project still maintained? We have not heard from you for a while. Hope everything is ok.

jvence commented 3 years ago

@victordibia It's a shame that this is no longer maintained. What are you plans vis-a-vis this project?

victordibia / neuralqa

Results in NeuralQA inconsistent with same model running on HF #60