Low probability. How to debug/improve?

I've been reading through all the issues I could find and the two most notable findings are:

That said, I'm not quite sure how to validate the results of cross-val-metrics. I did read the wiki articles but still struggle to make sense of it. I do have parsing_errors (quite a lot actually) but don't know how to improve the dataset based on it.

Removing sentences also did not help and I've been rather picky about what to add.

My dataset is a export from dialogflow (I wrote a converter script which supports intents and entities). Within dialogflow I made sure there are no validation errors and the same query gives me a much higher confidence score then I get with snips. I assume the calculation approach is very different (though probably hard to tell due to the closed-source nature of dialogflow).

Here are some confidence results I get:

Dialogflow

Query	In dataset	Confidence
ceiling lights on	yes	1
tv lights on	no	0.79
tv on	no	0.47 *

* I only just now tried this query and I am not sure how to feel about that result ':D

Snips

Query	In dataset	Confidence
ceiling lights on	yes	1
tv lights on	no	0.45
tv on	no	1

The second entry worries me. A confidence below .5 is quite bad for a query so similar to one within the dataset. With a previous .fit it got as low as .32

Here is:

I hope someone can help and sorry if I forgot something important

snipsco / snips-nlu

Low probability. How to debug/improve? #875

Dialogflow

Snips