instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
13 stars 29 forks source link

Add the uuid to each generated sample while saving to messages #111

Closed aakankshaduggal closed 1 month ago

aakankshaduggal commented 2 months ago

Add the uuid to each generated sample while saving to messages

import uuid

def generate_uuid():
     return str(uuid.uuid4())

Add this field to the final messages and save this field making with messages and metadata

final structure to look like {'messages' : '', 'metadata': '', 'id' : ''}

markmc commented 2 months ago

Could you add more context, please. What is the use case for a UUID?

bbrowning commented 1 month ago

The newer format files output as part of the data mixing work includes a UUID id field in every message. I don't know the why behind the id field myself, but just noting that we now have one.

markmc commented 1 month ago

So we can close this as resolved?