related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

Log requests and responses to/from OpenAI's API #25

Closed yonromai closed 11 months ago

yonromai commented 11 months ago

This PR adds persistence on disk of all actual requests and responses to/from OpenAI's API by default. (I don't know why I didn't implement this earlier - it's been bothering me for a while)

@ravwojdyla

it would be worth to take a look at the actual explanations from the current CoT prompts and look for patterns that we could improve, at least that worked well for me in the past. Otherwise we would be doing a "shotgun prompt tuning" ™️ ?

I fetched 50 more samples with the CoT prompt and used the new logging code to exact the prompt and response choices (#1, #2)

Note: If you want the explanations for some specific nodes (e.g. based on worst errors), I can nuke the cache and rerun the tagging for all 500 nodes so we have the CoT explanation for all of them.

(cc: @eric-czech @dhimmel )

ravwojdyla commented 11 months ago

@yonromai 🔥 🙏 I was curious about that specific example in https://github.com/related-sciences/nxontology-ml/pull/24#issuecomment-1721354224

'efo_definition': 'A viral infectious disease that results_in infection in ' 'sheep and rarely humans, has_material_basis_in Louping ' 'ill virus, which is transmitted_by sheep tick, Ixodes ' 'ricinus. The infection has_symptom lethargy, has_symptom ' 'muscle pains, has_symptom fever, and has_symptom focal ' 'neurological signs.', 'efo_id': 'EFO:0007348',

But feel free to ignore this request if it's complicated to get that.

yonromai commented 11 months ago

@ravwojdyla

But feel free to ignore this request if it's complicated to get that.

No, just had to nuke that puppy from the cache.

(In case useful: link to prompt).

Here's completion 1:

The record describes the term "louping ill". The description indicates that this is a specific viral infectious disease that has defined symptoms and a specific mode of transmission. This disease appears to have a distinct clinical profile. Therefore, I would categorize it as a high precision term. High precision terms tend to represent specific, well-defined conditions with distinguishing clinical characteristics.

<END_OF_COT>
id|precision
EFO:0007348|high

and completion 2:

The disease term given is "louping ill", a viral infectious disease found in sheep and rarely humans. It is caused by the Louping ill virus, which is transmitted by the sheep tick, Ixodes ricinus. The infection can result in symptoms such as lethargy, muscle pains, fever, and focal neurological signs. Due to its specificity in terms of causing organism, transmission vector and symptoms, alongside its limited host range (primarily affecting sheep and very rarely humans), it represents a more definite and specific group of infected individuals. Thereby, it meets the criteria of a high precision term. 

<END_OF_COT>
id|precision
EFO:0007348|high

Obvious note: The outcome is potentially different than last execution since non-deterministic

ravwojdyla commented 11 months ago

@yonromai nice 🙏 ! Man, world post-GPT is going to be very hard to debug ... 🤣Now is this dramatic change from all low to all high due to non-determinism, CoT, or something else 🤷‍♂️

yonromai commented 11 months ago

😭

I wonder if it's a sign that we should lower the model temperature

ravwojdyla commented 11 months ago

[!IMPORTANT] please don't let my comments distract you from the #13. Feel free to ignore my comments/requests or respond in a week or two :)

@yonromai actually looking at the notebook where this example came from in https://github.com/related-sciences/nxontology-ml/pull/24, it was the original prompt that classified it as all low, we don't know what the CoT for this example there. At least here it seems like it did better.

I fetched 50 more samples with the CoT prompt and used the new logging code to exact the prompt and response choices (#1, #2)

It would be cool if this was sorted by the distance from the true label, such that we could focus on the problematic examples.

yonromai commented 11 months ago

Sounds good, I'd be quick to check but probably wise to postpone it until after the workshop - I'll merge this PR for now.