Enable regex to extract floats in score generation

Items to add to release announcement:

Heading: During benchmarking of various feedback providers, I notice there are models (i.e. a finetuned mixtral-8x7b that tend to give 10.0 instead of 10 in their feedback scoring before normalization. In the current implementation, PATTERN_INTEGER will extract 0 and 10 from 10.0 and eventually pick the lesser value.

Failing example when testing groundedness feedback functions - where the score accompanying COT was interpreted as 0, instead of the expected 10:

0.0,
 {'reasons': 'STATEMENT 0:\nCriteria: I am a man and I love fish,\nSupporting Evidence: The source states "All men love fish", and the statement contains "I am a man and I love fish". The source contains the information that a man loves fish, and the statement contains the information that the speaker is a man and loves fish.\nScore: 10.0\n'}

I'm switching to PATTERN_NUMBER to unblock for now.

Other details that are good to know but need not be announced:

This is only a stopgap solution and I might be missing some contexts here as in why PATTERN_INTEGER was used over PATTERN_NUMBER in the previous PR. cc @sfc-gh-pmardziel to add more background if I'm missing sth obvious.

I do believe we should move toward structured and systematic feedback score generation mechanisms with some self-refining prompt iterations (i.e. via DSPy) ASAP for more robust score generation, ideally before integrating w/ the monitoring stack, even at the cost of slightly higher token usage/cost/latency (which can also be alleviated via better prompts and instruction tuning).

truera / trulens

Enable regex to extract floats in score generation #1223