Open aakankshaduggal opened 3 weeks ago
related to #6
@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)
@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)
In general, yes, I would always prefer keeping history. I can also walk you through how I would do it.
I haven't looked at these specifics. If the code in question is only used here, then moving it here seems like an easy decision. If it's used in other places, too, we should discuss further.
ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently
ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently
question is it called from only code other than what got moved here
Okay looked into this a bit - we import from two locations within instructlab
- config and utils
From config, we import the following:
DEFAULT_MULTIPROCESSING_START_METHOD
lab.py
but also this is just a variableDEFAULT_API_KEY
DEFAULT_MODEL_OLD
lab.py
but also this is just a variableget_model_family()
generate_data.py
but also server.py
From utils, we import the following:
chunk_document()
generate_data.py
max_seed_example_tokens()
generate_data.py
num_chars_from_tokens()
generate_data.py
read_taxonomy()
generate_data.py
but also lab.py
get_sysprompt()
generate_data.py
but also lab.py
, chat.py
and make_data.py
You can write rules for Ruff or PyLint to detect these types of imports and raise an error. InstructLab uses Ruff for that:
[tool.ruff.lint.flake8-tidy-imports.banned-api]
"yamllint".msg = "yamllint is for CLI usage only."
Some functions that are being called in the
generate_data.py
file are in this file - https://github.com/instructlab/instructlab/blob/main/src/instructlab/utils.pyto-do list:
DEFAULT_MULTIPROCESSING_START_METHOD
lab.py
but also this is just a variableDEFAULT_API_KEY
DEFAULT_MODEL_OLD
lab.py
but also this is just a variableget_model_family()
generate_data.py
but alsoserver.py
From utils, we import the following:
chunk_document()
generate_data.py
35
max_seed_example_tokens()
generate_data.py
35
num_chars_from_tokens()
generate_data.py
35
read_taxonomy()
generate_data.py
but alsolab.py
get_sysprompt()
generate_data.py
but alsolab.py
,chat.py
andmake_data.py