instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
5 stars 13 forks source link

Import functions from the ilab repository #11

Open aakankshaduggal opened 3 weeks ago

aakankshaduggal commented 3 weeks ago

Some functions that are being called in the generate_data.py file are in this file - https://github.com/instructlab/instructlab/blob/main/src/instructlab/utils.py


to-do list:

From utils, we import the following:

russellb commented 3 weeks ago

related to #6

nathan-weinberg commented 3 weeks ago

@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)

russellb commented 3 weeks ago

@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)

In general, yes, I would always prefer keeping history. I can also walk you through how I would do it.

I haven't looked at these specifics. If the code in question is only used here, then moving it here seems like an easy decision. If it's used in other places, too, we should discuss further.

nathan-weinberg commented 3 weeks ago

ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently

russellb commented 3 weeks ago

ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently

question is it called from only code other than what got moved here

nathan-weinberg commented 3 weeks ago

Okay looked into this a bit - we import from two locations within instructlab - config and utils

From config, we import the following:

From utils, we import the following:

tiran commented 1 week ago

You can write rules for Ruff or PyLint to detect these types of imports and raise an error. InstructLab uses Ruff for that:

[tool.ruff.lint.flake8-tidy-imports.banned-api]
"yamllint".msg = "yamllint is for CLI usage only."