instructlab / sdg

Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
19 stars 33 forks source link

Remove system prompt from data generation #96

Open oindrillac opened 3 months ago

oindrillac commented 3 months ago

Remove system prompt from data generation and will be re-introduced in the mixing phase.

https://github.com/instructlab/sdg/blob/b28a12bb647ef72f2b152051fc73d55c5a30da98/src/instructlab/sdg/generate_data.py#L38

shivchander commented 3 months ago

+1, would be good to introduce system role during the data mixing phase which prepares the dataset for training - this makes it a tad bit cleaner to understand - as the system role is only applicable to training