The models would be evaluated as incoherent.

impel-intelligence / dippy-bittensor-subnet

MIT License

48 stars 20 forks source link

The models would be evaluated as incoherent. #76

Closed apoplexyes closed 1 month ago

apoplexyes commented 1 month ago

I found an issue that consequence models would be evaluated as incoherent. There are 2 cases.

API call for gathering dataset for evaluation of coherent. (Due to security issue, stop api call & empty dataset return.)
OPENAI api key expired.

These are sample examples of what I am saying.

Thanks

donaldknoller commented 1 month ago

Can you please provide specific details on why this would differ from https://github.com/impel-intelligence/dippy-bittensor-subnet/issues/73 ? Also there is no direct proof of these claims:

OPENAI api key expired -> if this were the case, all coherence evaluations would be failing API call for gathering dataset for evaluation of coherent. (Due to security issue, stop api call & empty dataset return.) -> please provide a specific code solution if this is the case, otherwise we can address this as part of https://github.com/impel-intelligence/dippy-bittensor-subnet/issues/64 . I have updated the title to better reflect the root cause

torquedrop commented 1 month ago

I don't think it's because of openai api key or API call expiration. As you can see on above image, if the models are submitted at once in a very short time, the validators would be glitched. In that case, there's probability to skip the models' evaluation.

apoplexyes commented 1 month ago

It could be. I found another possibility. What if the validators' openai request exceeds limit? In that case, it's truely possible to get 0 coherent score.

donaldknoller commented 1 month ago

Some clarification:

I believe there is some confusion regarding receiving a real coherence score of 0 vs what appears to be a coherence score of 0. In the code, there is a minimum threshold of 0.95. Any coherence score that falls below (ie. 0.9444....) would be marked down to 0
Given the above, the real core issue here is not a specific error in the API (which is monitored), but rather the specific variance of the score.

donaldknoller commented 1 month ago

Addressed via this issue: https://github.com/impel-intelligence/dippy-bittensor-subnet/issues/64 There have been no erroneous incoherent model errors in the past 24 hours of scoring since the change was applied.