instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
5 stars 13 forks source link

updates to grounded flow #53

Open oindrillac opened 2 days ago

oindrillac commented 2 days ago

Stress testing was done on the SDG pipeline

markmc commented 8 hours ago

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn
  2. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior
  3. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?
oindrillac commented 3 hours ago

here are some explanations for the changes:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn

This is addressing an issue that we observed. In some of the cases where the model is being asked to provide a total score, it provides floating point sums like 1.5/2.5 and to accommodate those cases where we want to filter those values out it made sense to keep it float.

  1. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior

That is a step towards converting it to the final format (messages) required for training.

  1. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?

The template changes stem from model behavior on the templates and stress testing across various leaf nodes. The changes were mainly made to make sure the responses from the model align best with the expected behavior

markmc commented 2 hours ago

here are some explanations for the changes:

It would be great to get those explanations into the commit messages. See also "Information in commit messages" in https://www.berrange.com/tags/commit-message/

The commit message must contain all the information required to fully understand & review the patch for correctness. Less is not more. More is more.

shivchander commented 2 hours ago

here are some explanations for the changes:

It would be great to get those explanations into the commit messages. See also "Information in commit messages" in https://www.berrange.com/tags/commit-message/

The commit message must contain all the information required to fully understand & review the patch for correctness. Less is not more. More is more.

Thanks for the recommendation! We could start doing this here on, this PR needs to go in by today. We are on a very tight deadline

oindrillac commented 1 hour ago

a few minor changes, otherwise lgtm

done, thank you!