modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.17k stars 369 forks source link

an error about finetune llava-1.6-vicuna-7b-instruct #2091

Open xdaiycl opened 1 month ago

xdaiycl commented 1 month ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) The channel dimension is ambiguous. Got image shape (3, 672, 3). Assuming channels are the first dimension. RuntimeError: split_with_sizes expects split_sizes to sum exactly to 4 (input tensor's size at dimension 0), but got split_sizes=[3]

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) The channel dimension is ambiguous. Got image shape (3, 672, 3). Assuming channels are the first dimension. Package Version Editable project location


absl-py 2.1.0 accelerate 0.33.0 addict 2.4.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.1 aliyun-python-sdk-kms 2.16.4 annotated-types 0.7.0 anyio 4.4.0 apted 1.0.3 async-timeout 4.0.3 attrdict 2.0.1 attrs 24.2.0 beautifulsoup4 4.12.3 binpacking 1.5.2 bs4 0.0.2 certifi 2024.7.4 cffi 1.17.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 cmake 3.30.3 contourpy 1.2.1 cpm-kernels 1.0.11 crcmod 1.7 cryptography 43.0.0 cycler 0.12.1 dacite 1.8.1 datasets 2.21.0 decord 0.6.0 dill 0.3.8 diskcache 5.6.3 Distance 0.1.3 distro 1.9.0 docstring_parser 0.16 einops 0.8.0 exceptiongroup 1.2.2 fastapi 0.115.0 ffmpy 0.4.0 filelock 3.15.4 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 future 1.0.0 gguf 0.9.1 gradio 4.42.0 gradio_client 1.3.0 grpcio 1.65.5 h11 0.14.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.24.6 idna 3.7 importlib_metadata 8.4.0 importlib_resources 6.4.4 interegular 0.3.3 jieba 0.42.1 Jinja2 3.1.4 jiter 0.5.0 jmespath 0.10.0 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lark 1.2.2 llvmlite 0.43.0 lm-format-enforcer 0.10.3 lxml 5.3.0 Markdown 3.7 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.2 mdurl 0.1.2 mistral_common 1.4.2 modelscope 1.17.1 mpmath 1.3.0 ms-swift 2.4.0.dev0 msgpack 1.1.0 msgspec 0.18.6 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nltk 3.9.1 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu12 12.1.105 openai 1.42.0 opencv-python-headless 4.10.0.84 orjson 3.10.7 oss2 2.18.6 outlines 0.0.46 packaging 24.1 pandas 2.2.2 partial-json-parser 0.2.1.1.post4 peft 0.12.0 pillow 10.4.0 pip 24.2 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 protobuf 5.27.3 psutil 6.0.0 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 17.0.0 pycountry 24.6.1 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.9.2 pydantic_core 2.23.4 pydub 0.25.1 Pygments 2.18.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.2 pyzmq 26.2.0 ray 2.36.0 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rich 13.7.1 rouge 1.0.1 rpds-py 0.20.0 ruff 0.6.1 safetensors 0.4.4 scipy 1.14.1 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.5.1 shellingham 1.5.4 shtab 1.7.1 simplejson 3.19.3 six 1.16.0 sniffio 1.3.1 sortedcontainers 2.4.0 soupsieve 2.6 starlette 0.38.2 sympy 1.13.2 tensorboard 2.17.1 tensorboard-data-server 0.7.2 tiktoken 0.7.0 timm 1.0.8 tokenizers 0.19.1 tomlkit 0.12.0 torch 2.4.0 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.1 transformers-stream-generator 0.0.5 triton 3.0.0 trl 0.9.6 typer 0.12.4 typing_extensions 4.12.2 tyro 0.8.8 tzdata 2024.1 urllib3 2.2.2 uvicorn 0.30.6 uvloop 0.20.0 vllm 0.5.4 vllm-flash-attn 2.6.1 watchfiles 0.24.0 websockets 12.0 Werkzeug 3.0.4 wheel 0.43.0 xformers 0.0.27.post2 xxhash 3.5.0 yarl 1.9.4 zipp 3.20.0

Additional context Add any other context about the problem here(在这里补充其他信息) { "query": "Please recognize the table in the input image and output its HTML sequence representation without any extra explanation.", "response": "

<td colspan=\"3\">Year Ended December 31,<td colspan=\"3\">(In thousands)
200520042003
U.S. federal$547,547$471,181$327,039
Foreign167,912(8,088)(16,610)
$715,459$463,093$310,429
", "images": [ "image_path" ] }

Jintao-Huang commented 1 month ago

Could you please show me a screenshot of the error?

xdaiycl commented 1 month ago

捕获

你能给我看一下错误的截图吗?

Jintao-Huang commented 1 month ago

Send the shell script please.

xdaiycl commented 1 month ago

Send the shell script please.

OK, the shell script is CUDA_VISIBLE_DEVICES=0,1 swift sft --model_type llava1_6-vicuna-7b-instruct --sft_type lora --output_dir ./output/ --dataset ./train_llava_fintabnet.json --val_dataset ./val_llava_fintabnet.json --eval_steps 2000 --save_steps 2000 --logging_steps 5 --save_total_limit 5 --num_train_epochs 1 --dataloader_num_workers 1