instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
11 stars 27 forks source link

Let go of legacy simple SDG self instruct leaf node keys #102

Open oindrillac opened 1 month ago

oindrillac commented 1 month ago

With SDG simple following the pipeline approach instead of legacy self instruct loop, we do not need to convert keys to required format for self instruct here https://github.com/instructlab/sdg/blob/7efbbee54357f1d101ee0600faae867010885dcf/src/instructlab/sdg/utils/taxonomy.py#L350-L356 and convert them back here https://github.com/instructlab/sdg/blob/7efbbee54357f1d101ee0600faae867010885dcf/src/instructlab/sdg/utils/taxonomy.py#L474-L486 we can simply use the keys consistent with the latest structure

russellb commented 1 month ago

related to #59