[BUG] autotrain unable to run in shell script

jackswl commented 9 months ago

Prerequisites

[X] I have read the documentation.
[X] I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

I need some help here, really desperately.

As we know, the fine-tuning process can be executed on colab via this way:

!autotrain llm \
--train \
--model repo/some-model \
--project-name 'project-name' \
--data-path . \
--valid-split valid \
--use-peft \
--quantization int4 \
--lr 1e-5 \
--batch-size 16 \
--token hf_XXXX

However, I am intending to run this locally, but via a python script. This python script will be submitted as a PBS job to a High Performance Computer (HPC). As such, I cannot use the '!' before the autotrain.

However, I am unable to run the script simply by using:

autotrain llm \
--train \
--model repo/some-model \
--project-name 'project-name' \
--data-path . \
--valid-split valid \
--use-peft \
--quantization int4 \
--lr 1e-5 \
--batch-size 16 \
--token hf_XXXX

It will says that llm is invalid syntax, or autotrain command is not found.

Do you have any idea how to solve this @abhishekkrthakur? I am not understanding the solutions that other people have, as they are quite different. To sum up my question, how do I run the fine-tuning process on a python script instead of a colab?

UI Screenshots & Parameters

No response

Error Logs

-

Additional Information

No response

abhishekkrthakur commented 9 months ago

you need to install autotrain-advanced first pip install autotrain-advanced before running the command

jackswl commented 9 months ago

@abhishekkrthakur , hey thanks a ton for the fast reply. just clarifying with you, you mean this as the python script?:

pip install autotrain-advanced

autotrain llm \
--train \
--model deepseek-ai/deepseek-coder-6.7b-instruct \
--project-name 'project-name' \
--data-path . \
--valid-split valid \
--use-peft \
--quantization int4 \
--lr 1e-5 \
--batch-size 16 \
--epochs 4 \
--trainer sft \
--model_max_length 16384 \
--block-size -1 \
--lora-r 256 \
--lora-alpha 64 \
--scheduler cosine \
--lora-dropout 0 \
--weight-decay 0.01 \
--gradient-accumulation 1 \
--merge-adapter \
--token hf_xxx

and also, my conda environment is already pip installed with the autotrain-advanced. or do I do it via the python script? because my conda env is already installed with the autotrain-advanced. I am not sure what you meant.

Sorry, I am really new to this. If it is not too troublesome, could you give me a very brief python script sample for fine-tuning? I keep getting errors, such as invalid syntax, autotrain command not found, etc. If necessary, I can provide some screenshots.

jackswl commented 9 months ago

@abhishekkrthakur could you kindly please help to test a working python script for fine-tuning purposes?

Basically, instead of running on colab, I would want to run the fine-tuning on my .py script in my conda environment. Could you please help me with this? I have been searching for the solutions for hours to no avail :(

autotrain llm \
--train \
--model deepseek-ai/deepseek-coder-6.7b-instruct \
--project-name 'deepseek-coder-6.7b-instruct-finetuned' \
--data-path . \
--valid-split valid \
--use-peft \
--quantization int4 \
--lr 1e-5 \
--batch-size 16 \
--epochs 4 \
--trainer sft \
--model_max_length 16384 \
--block-size -1 \
--lora-r 256 \
--lora-alpha 64 \
--scheduler cosine \
--lora-dropout 0 \
--weight-decay 0.01 \
--gradient-accumulation 1 \
--merge-adapter \
--token hf_xxx

I just need to get this python script to work in my conda environment (that is already pip install with the autotrain-advanced -U)

abhishekkrthakur commented 9 months ago

you need to install autotrain-advanced in the environment you are running the script.

here is a sample .py script based on your command:


import subprocess

# Define the command as a list of arguments
command = [
    "autotrain",
    "llm",
    "--train",
    "--model",
    "deepseek-ai/deepseek-coder-6.7b-instruct",
    "--project-name",
    "project-name",
    "--data-path",
    ".",
    "--valid-split",
    "valid",
    "--use-peft",
    "--quantization",
    "int4",
    "--lr",
    "1e-5",
    "--batch-size",
    "16",
    "--epochs",
    "4",
    "--trainer",
    "sft",
    "--model_max_length",
    "16384",
    "--block-size",
    "-1",
    "--lora-r",
    "256",
    "--lora-alpha",
    "64",
    "--scheduler",
    "cosine",
    "--lora-dropout",
    "0",
    "--weight-decay",
    "0.01",
    "--gradient-accumulation",
    "1",
    "--merge-adapter",
    "--token",
    "hf_xxx"
]

# Run the command and wait for it to finish
try:
    subprocess.run(command, check=True, text=True, shell=True)
    print("Command executed successfully.")
except subprocess.CalledProcessError as e:
    print(f"Error executing command: {e}")

jackswl commented 9 months ago

@abhishekkrthakur thank you again for your swift reply, I have tested it. But here is what I get:

llm: autotrain: command not found
Error executing command: Command '['autotrain', 'llm', '--train', '--model', 'deepseek-ai/deepseek-coder-6.7b-instruct', '--project-name', 'project-name', '--data-path', '.', '--valid-split', 'valid', '--use-peft', '--quantization', 'int4', '--lr', '2e-5', '--batch-size', '2', '--epochs', '1', '--trainer', 'sft', '--model_max_length', '256', '--block-size', '-1', '--lora-r', '256', '--lora-alpha', '64', '--scheduler', 'cosine', '--lora-dropout', '0', '--weight-decay', '0.01', '--gradient-accumulation', '1', '--merge-adapter', '--token', 'hf_xxx']' returned non-zero exit status 127.

I have installed via conda environment:

pip install -U autotrain-advanced huggingface_hub

Any ideas?

I am executing the script (testtt.py) this way:

~/.conda/miniconda/4.9/envs/fyp/bin/python /home/svu/xxxxxxxx/folder2024/testtt.py

abhishekkrthakur commented 9 months ago

either the script is not using the environment that has autotrain-advanced installed in it or it is installing a much older version of autotrain-advanced. make sure you have the latest version from pip and that the script is using the environment.

jackswl commented 9 months ago

@abhishekkrthakur

I installed it via

pip install -U autotrain-advanced huggingface_hub

My autotrain-advanced version is 0.6.80, which seems to be very up to date. I have appended the entire conda list below Also, the script is indeed using the environment, because I am able to import modules such as transformers and AutoTokenizer etc, and check the version on it.

Should I have done pip install in another way? Will 'pip install autotrain-advanced' suffice?

_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
absl-py                   2.1.0                    pypi_0    pypi
accelerate                0.25.0                   pypi_0    pypi
aiofiles                  23.2.1                   pypi_0    pypi
aiohttp                   3.9.1                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
albumentations            1.3.1                    pypi_0    pypi
alembic                   1.13.1                   pypi_0    pypi
altair                    5.2.0                    pypi_0    pypi
annotated-types           0.6.0                    pypi_0    pypi
anyio                     3.7.1                    pypi_0    pypi
arrow                     1.3.0                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
autotrain-advanced        0.6.80                   pypi_0    pypi
bitsandbytes              0.41.0                   pypi_0    pypi
blas                      1.0                         mkl    defaults
brotli                    1.1.0                    pypi_0    pypi
brotli-python             1.0.9           py310h6a678d5_7    defaults
bzip2                     1.0.8                h7b6447c_0    defaults
ca-certificates           2023.12.12           h06a4308_0    defaults
cachetools                5.3.2                    pypi_0    pypi
certifi                   2023.11.17      py310h06a4308_0    defaults
cffi                      1.16.0          py310h5eee18b_0    defaults
chardet                   4.0.0           py310h06a4308_1003    defaults
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cmaes                     0.10.0                   pypi_0    pypi
codecarbon                2.2.3                    pypi_0    pypi
colorlog                  6.8.0                    pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cryptography              41.0.7          py310hdda0065_0    defaults
cycler                    0.12.1                   pypi_0    pypi
datasets                  2.14.7                   pypi_0    pypi
diffusers                 0.21.4                   pypi_0    pypi
dill                      0.3.7                    pypi_0    pypi
docstring-parser          0.15                     pypi_0    pypi
einops                    0.6.1                    pypi_0    pypi
evaluate                  0.3.0                    pypi_0    pypi
exceptiongroup            1.2.0                    pypi_0    pypi
fastapi                   0.104.1                  pypi_0    pypi
ffmpeg                    4.2.2                h20bf706_0    defaults
ffmpy                     0.3.1                    pypi_0    pypi
filelock                  3.13.1          py310h06a4308_0    defaults
fonttools                 4.47.2                   pypi_0    pypi
freetype                  2.12.1               h4a9f257_0    defaults
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2023.10.0                pypi_0    pypi
fuzzywuzzy                0.18.0                   pypi_0    pypi
giflib                    5.2.1                h5eee18b_3    defaults
gmp                       6.2.1                h295c915_3    defaults
gmpy2                     2.1.2           py310heeb90bb_0    defaults
gnutls                    3.6.15               he1e5248_0    defaults
google-auth               2.26.2                   pypi_0    pypi
google-auth-oauthlib      1.2.0                    pypi_0    pypi
gradio                    3.41.0                   pypi_0    pypi
gradio-client             0.5.0                    pypi_0    pypi
greenlet                  3.0.3                    pypi_0    pypi
grpcio                    1.60.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
hf-transfer               0.1.4                    pypi_0    pypi
httpcore                  1.0.2                    pypi_0    pypi
httpx                     0.26.0                   pypi_0    pypi
huggingface-hub           0.20.2                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        7.0.1                    pypi_0    pypi
importlib-resources       6.1.1                    pypi_0    pypi
inflate64                 1.0.0                    pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46306    defaults
invisible-watermark       0.2.0                    pypi_0    pypi
ipadic                    1.0.0                    pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
jiwer                     3.0.2                    pypi_0    pypi
joblib                    1.3.1                    pypi_0    pypi
jpeg                      9e                   h5eee18b_1    defaults
jsonschema                4.21.1                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lame                      3.100                h7b6447c_0    defaults
lazy-loader               0.3                      pypi_0    pypi
lcms2                     2.12                 h3be6417_0    defaults
ld_impl_linux-64          2.38                 h1181459_1    defaults
lerc                      3.0                  h295c915_0    defaults
libdeflate                1.17                 h5eee18b_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libidn2                   2.3.4                h5eee18b_0    defaults
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
libopus                   1.3.1                h7b6447c_0    defaults
libpng                    1.6.39               h5eee18b_0    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libtasn1                  4.19.0               h5eee18b_0    defaults
libtiff                   4.5.1                h6a678d5_0    defaults
libunistring              0.9.10               h27cfd23_0    defaults
libuuid                   1.41.5               h5eee18b_0    defaults
libvpx                    1.7.0                h439df22_0    defaults
libwebp                   1.3.2                h11a3e52_0    defaults
libwebp-base              1.3.2                h5eee18b_0    defaults
llvm-openmp               14.0.6               h9e868ea_0    defaults
loguru                    0.7.0                    pypi_0    pypi
lz4-c                     1.9.4                h6a678d5_0    defaults
mako                      1.3.0                    pypi_0    pypi
markdown                  3.5.2                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.4                    pypi_0    pypi
matplotlib                3.8.2                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mkl                       2023.1.0         h213fc3f_46344    defaults
mkl-service               2.4.0           py310h5eee18b_1    defaults
mkl_fft                   1.3.8           py310h5eee18b_0    defaults
mkl_random                1.2.4           py310hdb19cb5_0    defaults
mpc                       1.1.0                h10f8cd9_1    defaults
mpfr                      4.0.2                hb69a4c5_1    defaults
mpmath                    1.3.0           py310h06a4308_0    defaults
multidict                 6.0.4                    pypi_0    pypi
multiprocess              0.70.15                  pypi_0    pypi
multivolumefile           0.2.3                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
nettle                    3.7.3                hbbd107a_1    defaults
networkx                  3.2.1                    pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
numpy                     1.26.3          py310h5f9d8c6_0    defaults
numpy-base                1.26.3          py310hb5e798b_0    defaults
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencv-python             4.9.0.80                 pypi_0    pypi
opencv-python-headless    4.9.0.80                 pypi_0    pypi
openh264                  2.1.1                h4ff587b_0    defaults
openjpeg                  2.4.0                h3ad879b_0    defaults
openssl                   3.0.12               h7f8727e_0    defaults
optuna                    3.3.0                    pypi_0    pypi
orjson                    3.9.12                   pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    2.2.0                    pypi_0    pypi
peft                      0.7.1                    pypi_0    pypi
pillow                    10.0.0                   pypi_0    pypi
pip                       23.3.1          py310h06a4308_0    defaults
protobuf                  4.23.4                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
py7zr                     0.20.6                   pypi_0    pypi
pyarrow                   14.0.2                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pyasn1                    0.5.1                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pybcj                     1.0.2                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0    defaults
pycryptodomex             3.20.0                   pypi_0    pypi
pydantic                  2.4.2                    pypi_0    pypi
pydantic-core             2.10.1                   pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pyngrok                   7.0.3                    pypi_0    pypi
pynvml                    11.5.0                   pypi_0    pypi
pyopenssl                 23.2.0          py310h06a4308_0    defaults
pyparsing                 3.1.1                    pypi_0    pypi
pyppmd                    1.0.0                    pypi_0    pypi
pysocks                   1.7.1           py310h06a4308_0    defaults
python                    3.10.12              h955ad1f_0    defaults
python-dateutil           2.8.2                    pypi_0    pypi
python-multipart          0.0.6                    pypi_0    pypi
pytorch                   2.1.2              py3.10_cpu_0    pytorch
pytorch-mutex             1.0                         cpu    pytorch
pytz                      2023.3.post1             pypi_0    pypi
pywavelets                1.5.0                    pypi_0    pypi
pyyaml                    6.0.1           py310h5eee18b_0    defaults
pyzstd                    0.15.9                   pypi_0    pypi
qudida                    0.0.4                    pypi_0    pypi
rapidfuzz                 2.13.7                   pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
referencing               0.32.1                   pypi_0    pypi
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0          py310h06a4308_0    defaults
requests-oauthlib         1.3.1                    pypi_0    pypi
responses                 0.18.0                   pypi_0    pypi
rich                      13.7.0                   pypi_0    pypi
rouge-score               0.1.2                    pypi_0    pypi
rpds-py                   0.17.1                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
sacremoses                0.0.53                   pypi_0    pypi
safetensors               0.4.1                    pypi_0    pypi
scikit-image              0.22.0                   pypi_0    pypi
scikit-learn              1.3.0                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
semantic-version          2.10.0                   pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.2.2          py310h06a4308_0    defaults
shtab                     1.6.5                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sniffio                   1.3.0                    pypi_0    pypi
sqlalchemy                2.0.25                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
starlette                 0.27.0                   pypi_0    pypi
sympy                     1.12            py310h06a4308_0    defaults
tbb                       2021.8.0             hdb19cb5_0    defaults
tensorboard               2.15.1                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
texttable                 1.7.0                    pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.12.9                pypi_0    pypi
tiktoken                  0.5.1                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
tokenizers                0.15.0                   pypi_0    pypi
toolz                     0.12.0                   pypi_0    pypi
torch                     2.1.2                    pypi_0    pypi
torchvision               0.16.2                py310_cpu    pytorch
tqdm                      4.65.0                   pypi_0    pypi
transformers              4.36.1                   pypi_0    pypi
triton                    2.1.0                    pypi_0    pypi
trl                       0.7.4                    pypi_0    pypi
types-python-dateutil     2.8.19.20240106          pypi_0    pypi
typing_extensions         4.9.0           py310h06a4308_1    defaults
tyro                      0.6.6                    pypi_0    pypi
tzdata                    2023.4                   pypi_0    pypi
urllib3                   2.1.0                    pypi_0    pypi
uvicorn                   0.22.0                   pypi_0    pypi
websockets                11.0.3                   pypi_0    pypi
werkzeug                  2.3.6                    pypi_0    pypi
wheel                     0.41.2          py310h06a4308_0    defaults
x264                      1!157.20191217       h7b6447c_0    defaults
xgboost                   1.7.6                    pypi_0    pypi
xxhash                    3.4.1                    pypi_0    pypi
xz                        5.4.5                h5eee18b_0    defaults
yaml                      0.2.5                h7b6447c_0    defaults
yarl                      1.9.4                    pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults
zstd                      1.5.5                hc292b87_0    defaults

abhishekkrthakur commented 9 months ago

that seems right. the script is then probably not using the environment with autotrain in it. sorry, this is something you need to check yourself.

jackswl commented 9 months ago

@abhishekkrthakur please don't close this thread yet. I can make this experiment replicable, you can give it a try, and it should still throw up the same error for you too. Give me a while to type it up from scratch.

jackswl commented 9 months ago

You are right. I have made some slight addition to your code, and this should work. My conda env is not linked well, a problem on my side:

Everyone else who is looking for how to fine-tune via python script: To run fine-tuning process on a python script:

import subprocess

# Define the command as a list of arguments
command = [
    "autotrain",
    "llm",
    "--train",
    "--model",
    "deepseek-ai/deepseek-coder-6.7b-instruct",
    "--project-name",
    "insert-project-name-here",
    "--data-path",
    ".",
    "--valid-split",
    "valid",
    "--use-peft",
    "--quantization",
    "int4",
    "--lr",
    "1e-5",
    "--batch-size",
    "16",
    "--epochs",
    "4",
    "--trainer",
    "sft",
    "--model_max_length",
    "16384",
    "--block-size",
    "-1",
    "--lora-r",
    "256",
    "--lora-alpha",
    "64",
    "--scheduler",
    "cosine",
    "--lora-dropout",
    "0",
    "--weight-decay",
    "0.01",
    "--gradient-accumulation",
    "1",
    "--merge-adapter",
    "--token",
    "hf_xxx"
]

command_str = " ".join(command)

# Run the command and wait for it to finish
try:
    subprocess.run(command_str, check=True, text=True, shell=True)
    print("Command executed successfully.")
except subprocess.CalledProcessError as e:
    print(f"Error executing command: {e}")

Thanks a ton.

abhishekkrthakur commented 9 months ago

in future, kindly do your best to figure out in the beginning before writing things like "I need some help here, really desperately", "it should still throw up the same error for you too", etc as it creates unnecessary urgency.

huggingface / autotrain-advanced