Closed sunzx8 closed 4 months ago
我查了一下这个batch返回的是图片的两个元素,没有input_ids
请问这是什么原因?
八卡device map可能会有问题,试下2/4卡
您好,我这里测出来是max_length的问题,请问为什么我设置max_length从2048到4096过后就会报错
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
还有我想问一下如果需要16卡两台机器一起微调需要怎么设置?
CUDA报错,可能是OOM或者CUDA环境问题
多机多卡readme里有样例
CUDA报错,可能是OOM或者CUDA环境问题
多机多卡readme里有样例
还有个问题,我发现用您给的lora微调方式虽然param显示只训练了很少的参数,但是显存消耗和全参数一模一样,请问这是不是实际没有转换过来?
实际消耗显存和全参数微调coco-mini的一样是241gb
CUDA报错,可能是OOM或者CUDA环境问题 多机多卡readme里有样例
还有个问题,我发现用您给的lora微调方式虽然param显示只训练了很少的参数,但是显存消耗和全参数一模一样,请问这是不是实际没有转换过来? 实际消耗显存和全参数微调coco-mini的一样是241gb
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset coco-mini-en-2 --sft_type lora
八卡device map可能会有问题,试下2/4卡
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
运行指令 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset /dev/shm/shawn/data/ftoy.jsonl --sft_type full
数据格式为 {"query": "输出图片内容的markdown内容,如果有表格,则输出为html格式", "response": "```markdown\nAdaptive Quotient Filters\n\nConference '17, July 2017, Washington, DC, USA\n\n[34] Russell Housley, Warwick Ford, William Polk, and David Solo. 1999. Internet X.509 public key infrastructure certificate and CRL profile. Technical Report. M. Frans Kaashoek. 2002. The case for application-specific protocols. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP).", "images": ["/dev/shm/shawn/data/input/2405.10253v1/2405.10253v1-p16.png"]}
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) 8*L20
Additional context Add any other context about the problem here(在这里补充其他信息)