huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.28k stars 1.16k forks source link

losses = losses * self.args.rpo_alpha + policy_nll_loss TypeError: only integer tensors of a single element can be converted to an index #1924

Open yiyepiaoling0715 opened 1 month ago

yiyepiaoling0715 commented 1 month ago

image trl 0.9.6

yiyepiaoling0715 commented 1 month ago

absl-py 2.1.0 accelerate 0.33.0 aiohttp 3.8.6 aiosignal 1.3.1 anaconda-anon-usage 0.4.4 annotated-types 0.7.0 anyio 3.7.1 APScheduler 3.10.4 archspec 0.2.3 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.3 attrs 23.1.0 Babel 2.15.0 bce-python-sdk 0.8.99 beautifulsoup4 4.12.3 bitsandbytes 0.43.3 bleach 6.1.0 boltons 23.0.0 boto3 1.34.154 botocore 1.34.154 Brotli 1.0.9 certifi 2024.7.4 cffi 1.16.0 chardet 4.0.0 charset-normalizer 2.0.4 click 8.1.7 cloudpickle 3.0.0 cmake 3.30.2 cn2an 0.5.22 colorama 0.4.6 comm 0.2.2 conda 24.5.0 conda-build 24.5.1 conda-content-trust 0.2.0 conda_index 0.5.0 conda-libmamba-solver 24.1.0 conda-package-handling 2.3.0 conda_package_streaming 0.10.0 contourpy 1.2.1 cryptography 42.0.5 ctranslate2 4.3.1 cycler 0.12.1 datasets 2.19.2 debugpy 1.8.5 decorator 5.1.1 deepspeed 0.14.4 defusedxml 0.7.1 Deprecated 1.2.14 detect-secrets 1.5.0 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 dnspython 2.6.1 docker-pycreds 0.4.0 docstring_parser 0.16 einops 0.8.0 et-xmlfile 1.1.0 evaluate 0.4.2 executing 0.8.3 expecttest 0.2.1 fastapi 0.112.0 fastjsonschema 2.20.0 filelock 3.13.1 fire 0.6.0 flash-attn 2.6.3 fonttools 4.53.1 frozendict 2.4.2 frozenlist 1.4.1 fsspec 2024.3.1 ftfy 6.2.3 future 1.0.0 fuzzywuzzy 0.18.0 gibberish-detector 0.1.1 gitdb 4.0.11 GitPython 3.1.43 gmpy2 2.1.2 greenlet 3.0.3 grpcio 1.65.4 h11 0.14.0 hf_hub_ctranslate2 2.13.1 hjson 3.1.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.24.5 humanize 4.10.0 hypothesis 6.108.4 icetk 0.0.7 idaas-sdk 0.4.2 idna 3.7 interegular 0.3.3 ipykernel 6.29.5 ipython 8.25.0 ipython-genutils 0.2.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 json5 0.9.25 jsonlines 4.0.0 jsonpatch 1.33 jsonpointer 2.1 jsonschema 4.19.2 jsonschema-specifications 2023.7.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-server 1.23.3 jupyterlab 3.3.2 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jwcrypto 1.5.6 kiwisolver 1.4.5 lark 1.1.9 libarchive-c 2.9 libmambapy 1.5.8 lintrunner 0.12.5 lit 18.1.8 llvmlite 0.43.0 lm-dataformat 0.0.20 lm-format-enforcer 0.10.3 loguru 0.7.2 lpai 0.61.0 lpai-asset 2.1.9 lxml 5.2.2 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.9.0 matplotlib-inline 0.1.6 mdurl 0.1.2 menuinst 2.1.1 minio 7.2.7 mistune 3.0.2 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 more-itertools 10.1.0 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 mysql-connector 2.2.9 nbclassic 0.5.6 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.3 nibabel 5.2.1 ninja 1.11.1.1 nltk 3.8.1 notebook_shim 0.2.4 numba 0.60.0 numpy 1.26.4 numpyencoder 0.3.0 nvidia-ml-py 12.555.43 openai 1.39.0 openpyxl 3.1.5 optree 0.12.1 outlines 0.0.46 packaging 24.1 pandas 2.2.2 pandocfilters 1.5.1 parso 0.8.3 peft 0.12.0 pexpect 4.8.0 pillow 10.4.0 pip 24.0 pkginfo 1.10.0 platformdirs 3.10.0 pluggy 1.0.0 portalocker 2.10.1 proces 0.1.7 progress 1.6 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 prompt-toolkit 3.0.43 protobuf 5.27.3 protocol 0.37 psutil 5.9.0 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 12.0.1 pyarrow-hotfix 0.6 pybind11 2.13.1 pycosat 0.6.6 pycountry 24.6.1 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 pydicom 2.4.4 Pygments 2.15.1 pyparsing 3.1.2 pypng 0.20220715.0 PySocks 1.7.1 python-box 7.2.0 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-etcd 0.4.5 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.1.0 qrcode 7.4.2 rank-bm25 0.2.2 ray 2.34.0 referencing 0.30.2 regex 2024.7.24 requests 2.32.3 rich 13.7.1 rjieba 0.1.11 rpds-py 0.10.6 ruamel.yaml 0.17.21 s3transfer 0.10.2 sacrebleu 2.4.2 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.0 Send2Trash 1.8.3 sentence-transformers 3.0.1 sentencepiece 0.2.0 sentry-sdk 2.12.0 setproctitle 1.3.3 setuptools 69.5.1 shellingham 1.5.4 shtab 1.7.1 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 sortedcontainers 2.4.0 soupsieve 2.5 SQLAlchemy 2.0.32 sqlitedict 2.1.0 stack-data 0.2.0 starlette 0.37.2 sympy 1.13.1 tabulate 0.9.0 tblib 3.0.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 terminado 0.18.1 threadpoolctl 3.5.0 tiktoken 0.7.0 timeout-decorator 0.5.0 tinycss2 1.3.0 tokenizers 0.19.1 torch 2.4.0 torchaudio 2.4.0 torchelastic 0.2.2 torchvision 0.19.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.43.4 transformers-stream-generator 0.0.5 tree-sitter 0.21.3 triton 3.0.0 trl 0.9.6 truststore 0.8.0 typer 0.12.3 types-dataclasses 0.6.6 typing_extensions 4.11.0 tyro 0.8.5 tzdata 2024.1 tzlocal 5.2 ujson 5.10.0 urllib3 2.2.2 utils 1.0.2 uvicorn 0.30.5 uvloop 0.19.0 vllm 0.5.4 vllm-flash-attn 2.6.1 wandb 0.17.5 watchfiles 0.22.0 wcwidth 0.2.12 webencodings 0.5.1 websocket-client 1.8.0 websockets 12.0 Werkzeug 3.0.3 wheel 0.43.0 wrapt 1.16.0 xformers 0.0.27.post2 xorbits 0.7.2 xoscar 0.3.2 xxhash 3.4.1 yarl 1.9.4 zstandard 0.22.0