mistralai / mistral-finetune

Apache License 2.0
2.68k stars 220 forks source link

[BUG]: The _parse_available_tools method does not return all the defined tools. #77

Closed matheus-prandini closed 3 months ago

matheus-prandini commented 3 months ago

Python Version

3.11.9

Pip Freeze

absl-py==2.1.0
accelerate==0.31.0
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
attrs==23.2.0
blosc2==2.7.0
build==1.2.1
CacheControl==0.14.0
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
cleo==2.1.0
click==8.1.7
coloredlogs==15.0.1
cortex==0.1.0
crashtest==0.4.1
cryptography==42.0.8
datasets==2.20.0
dill==0.3.8
distlib==0.3.8
dnspython==2.6.1
docker-pycreds==0.4.0
docstring_parser==0.16
dulwich==0.21.7
einops==0.8.0
email_validator==2.2.0
fastapi-cli==0.0.4
fastjsonschema==2.20.0
filelock==3.14.0
fire==0.6.0
flash-attn==2.5.9.post1
flatbuffers==24.3.25
frozenlist==1.4.1
fsspec==2024.5.0
gitdb==4.0.11
GitPython==3.1.43
grpcio==1.64.1
grpcio-tools==1.64.1
h11==0.14.0
h5py==3.11.0
hf_transfer==0.1.6
hjson==3.1.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.4
humanfriendly==10.0
idna==3.7
importlib_metadata==7.2.0
iniconfig==2.0.0
inquirerpy==0.3.4
installer==0.7.0
jaraco.classes==3.4.0
jeepney==0.8.0
Jinja2==3.1.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
keyring==24.3.1
lightning-utilities==0.11.2
loralib==0.1.2
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mistral_common==1.1.0
-e git+ssh://git@github.com/mistralai/mistral-inference.git@e3a64e48250b41d012570234e228c6da4853567e#egg=mistral_inference
more-itertools==10.3.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy==1.10.0
mypy-extensions==1.0.0
mypy-protobuf==3.6.0
ndindex==1.8
networkx==3.2.1
ninja==1.11.1.1
numexpr==2.10.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
onnx==1.16.1
onnxruntime==1.18.0
orjson==3.10.5
packaging==24.0
pandas==2.2.2
peft==0.11.1
pexpect==4.9.0
pfzy==0.3.4
pkginfo==1.11.1
platformdirs==4.2.2
pluggy==1.5.0
poetry-core==1.9.0
prompt_toolkit==3.0.47
protobuf==5.27.0
psutil==6.0.0
ptyprocess==0.7.0
py-cpuinfo==9.0.0
pyarrow==16.1.0
pyarrow-hotfix==0.6
pycparser==2.22
pydantic==2.6.1
pydantic_core==2.16.2
pydash==8.0.1
Pygments==2.18.0
pyproject_hooks==1.1.0
pytest==7.4.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
rapidfuzz==3.9.3
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
requests-toolbelt==1.0.0
rich==13.7.1
rpds-py==0.18.1
ruff==0.2.2
safetensors==0.4.3
SecretStorage==3.3.3
sentencepiece==0.2.0
sentry-sdk==2.6.0
setproctitle==1.3.3
shellingham==1.5.4
shtab==1.7.1
simple_parsing==0.1.5
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sseclient==0.0.27
starlette==0.37.2
sympy==1.12
tables==3.9.2
tensorboard==2.16.2
tensorboard-data-server==0.7.2
termcolor==2.4.0
tokenizers==0.19.1
tomlkit==0.12.5
torch==2.3.0
torchmetrics==1.4.0.post0
tqdm==4.66.4
transformers==4.41.2
triton==2.3.0
trl==0.9.4
trove-classifiers==2024.5.22
typer==0.12.3
types-protobuf==4.24.0.20240129
typing_extensions==4.12.0
tyro==0.8.4
tzdata==2024.1
ujson==5.10.0
urllib3==2.2.2
uvicorn==0.30.1
uvloop==0.19.0
virtualenv==20.26.2
wandb==0.17.3
watchfiles==0.22.0
wcwidth==0.2.13
websockets==12.0
Werkzeug==3.0.3
xformers==0.0.26.post1
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.2

Reproduction Steps

During training, when creating the training/validation examples, the parse_available_tools method does not consider all the defined tools. Currently, only the last tool is being added to the available_tools list.

Expected Behavior

All tools should be added to the available tools list.

Additional Context

Method:

def _parse_available_tools(tools: List[Dict[str, Any]]) -> List[Tool]:
    available_tools = []
    for tool in tools:
        if "name" in tool:
            tool = {"type": "function", "function": tool}

        if "function" not in tool:
            raise FunctionFormatError(
                "A tool dict does not have a 'function' key.", str(tool)
            )

        func_data = tool["function"]

        for key in ["name", "description", "parameters"]:
            if key not in func_data:
                raise FunctionFormatError(
                    f"A function dict does not have a {key} key.", str(func_data)
                )

        if not isinstance(func_data["parameters"], dict):
            raise FunctionFormatError(
                f"A function 'parameters' key has to be of type dict, but is {type(func_data['parameters'])}. If the function has no parameters pass an empty dict ", str(func_data)
            )

        description = func_data["description"]
        function = Function(
            name=func_data["name"],
            description=description,
            parameters=func_data["parameters"],
        )

    available_tools.append(Tool(function=function))
    return available_tools

The line available_tools.append(Tool(function=function)) is where the issue is. I think it should be inside the for loop.

Suggested Solutions

I've already created a PR to fix this bug.

PR: https://github.com/mistralai/mistral-finetune/pull/76

pandora-s-git commented 3 months ago

Thanks a lot and nice catch! We merged your fix, closing the issue now ;>