Open Jintao-Huang opened 1 month ago
你好,请问一下支持 swift vllm 部署吗?类似下面的指令 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type llava1_6-vicuna-13b-instruct --infer_backend vllm
请问何时能支持vllm推理?
I am trying to finetune GOT in Hindi. The dataset I am using is from HuggingFace datasets( damerajee/hindi-ocr ). It contains only two columns, one is an image and the other is text present in the image.
I have prepared a json file in the following format(taken from the official GOT OCR2.0 repo)
{"query": "
{
"query": "
Is the above .json file right? Or should I be placing the image object(PIL image object) instead of the image path? In response, I have given the text(ground truth) that I am expecting from the model, am I right?
Now the issue is how do I use this fine-tuned model? I went through the documentation, unlike in your Official GOT online demo which directly accepts image, in this fine tuned version one must enter a prompt or "
I am doing all this as a part of a project to build a basic application using Streamlit. The GitHub repository of the same is given below- https://github.com/AISpaceXDragon/GOT-OCR2.0.git
Thank you for giving your time in reading my queries and I hope that I will receive your response as soon as possible.
@AISpaceXDragon I see you have successfully fine tuned the model in another language, Hindi. Can you provide me with a way to build a training dataset on the new language you made? I'm very grateful for that
@AISpaceXDragon I see you have successfully fine tuned the model in another language, Hindi. Can you provide me with a way to build a training dataset on the new language you made? I'm very grateful for that
As I mentioned I am using one of the dataset from HuggingFace Datasets link and I didn't build it. But I think you meant building the ".json file" for a given dataset,is it? Please let me know, so that I could assist you.
@AISpaceXDragon That's right, I mean how to build ".json file" from a standard data set
@AISpaceXDragon Can you tell me at what stage do you do it when fine tuning? And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?
@AISpaceXDragon That's right, I mean how to build ".json file" from a standard data set
I wrote a python script to prepare the .json file for a dataset. The format of the entries in the json file is same as mentioned in the comment before. The script that I have written takes the image and stores them in a folder. Whereas the "response" part in the json entry, contains the ground truth(text present in the image, in my case) <-- This is what we want our model to give as a reply, when given with the image path specificed in the "image" part of the json entry.
This is what I have done, but I was not able to evaluate the model with the same format.
This is why I posted comment in this issues space.
Format - {"query": "55555", "response": "66666", "images": ["image_path"]}
@AISpaceXDragon Can you tell me at what stage do you do it when fine tuning? And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?
What do you mean by "Can you tell me at what stage do you do it when fine tuning?"? I didn't get you. Please try to be clear.
Answer for "And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?" The thing is that, I fine-tuned the model on Google Colab, which means limited compute resources. As per my observation, if fine-tuned for more number of epochs and on more data ,the results would be excellent(as mentioned in the research paper).
@AISpaceXDragon Reply to "Can you tell me at what stage do you do it when fine tuning?". I see the author mentioned the following in the README.md section:
0.Train sample can be found here. Note that the '
@minhduc01168 Reply to "I see the author mentioned the following in the README.md section: 0.Train sample can be found here. Note that the '' in the 'conversations'-'human'-'value' is necessary! 1.This codebase only supports post-training (stage-2/stage-3) upon our GOT weights.
I see that you are referring to training of the model, but I am referring to fine-tuning of the model. This means I am working only at Stage 2 or 3.
Note that training is different from fine tuning. Trianing means taking the defined model architecture with random weights and passing all the inputs until it the model gives corresponding correct outputs. Fine-tuning means taking these pretrained weights(learnings of the model) and use it for a specific variation of the same task.(In this case I want to perform OCR which is the main aim of the model, but as the training data used while training the model was mostly English and Chinese, the model is efficient only at these languages. But I want the model to extend these capabilities to other language, in my case Hindi, so I took the pretrained weights(ability of the model to extract text from images) and trained it on different language. This means, I want the ability of the model to extract text from images but only for different language along side the languages which it was already trained on.
I hope you understand what I am trying to convey. Let me know, if you didn't understand any part of the explanation.
@Jintao-Huang Could you answer my question?
@AISpaceXDragon Have you had anyone explain the data format below? Can you explain it to me? I'm very grateful for that.
{"query": "
hey , can someone please help and tell me how can i train this model on MNIST dataset?
@AISpaceXDragon HELP please
@minhduc01168 Reply to "Have you had anyone explain the data format below? Can you explain it to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}"
Answer for first part of the question, I understood them myself. No one explained them to me.
Answer to the second part of the question, There are three formats of there are three formats of data as mentioned. The first one is query. This contains the prompt and the image tag i.e.,
The explanation for second data format is similar to the first one, except it contains a new entry that is history which records all the previous responses of the model for the given images.
The explanation for the third data format is similar to that of the above. Here, the history contains list of all the query and responses pairs that you have given separately in the data format one.
I hope you are understood by explanation else let me know. Thank you.
hey , can someone please help and tell me how can i train this model on MNIST dataset?
Follow the instructions as given by modelscope's ms swift documentation.
Let me know if you didn't get it, thank you.
I tried it on google colab and i got the error above as i send
On Sun, Oct 6, 2024, 1:30 PM Srimanth @.***> wrote:
hey , can someone please help and tell me how can i train this model on MNIST dataset?
Follow the instructions as given by modelscope's ms swift documentation.
Let me know if you didn't get it, thank you.
— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/2122#issuecomment-2395384530, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIFJWBPRBAC5GCZOQVNIRRDZ2EGLRAVCNFSM6AAAAABOZ62CEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGM4DINJTGA . You are receiving this because you commented.Message ID: @.***>
@AISpaceXDragon Did you train successfully and is everything working well? Thank you very much for your answer.
Yes training fine but testing no at alll
On Sun, Oct 6, 2024, 6:17 PM minhduc01168 @.***> wrote:
@AISpaceXDragon https://github.com/AISpaceXDragon Did you train successfully and is everything working well? Thank you very much for your answer.
— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/2122#issuecomment-2395475873, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIFJWBP32DORYJ5YASVCRX3Z2FIBBAVCNFSM6AAAAABOZ62CEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGQ3TKOBXGM . You are receiving this because you commented.Message ID: @.***>
infer vllm?
@Jintao-Huang Could you answer my question?
Hello, the holiday just ended, and I didn’t reply in time. What was the issue? 😊
@Jintao-Huang Can you explain it to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
This format might be clearer.
{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
@Jintao-Huang Thank you for explaining it to me. Because my GPU resources are limited. Can you tell me how I can load the weight model to continue training? Thank you
@AISpaceXDragon sorry, What is OCR u use to have response in Data Format? Pytesseract or GOT-OCR or something?? Thank u
I didn't get you. Please try to be clear.
@Jintao-Huang I wanna fine-tune to OCR table image other language. I don't get what is content of response ? Have structure table line by line or Latex tabular? Can u explain help me? Thank u
{"query": "
微调后,调用微调后的模型报错: 如何解决?模型内容如下:
你需要merge lora. 才会有config.json文件
请问merge lora应该在哪一步操作呀?我不是很懂,谢谢!
请问merge lora应该在哪一步操作呀?我不是很懂,谢谢!
已解决。cd 进微调模型目录下,执行: swift merge-lora --ckpt_dir xxx
@Jintao-Huang When I load the fine-tuned model from the checkpoint, shouldn't it use the model weights from the checkpoint? I see that, when I load the model to infer after the fine-tuning, in the output I could see the following, "Downloading the model from model scope hub".Is this the way it works?(Is this the expected behavior?)
I ran the following code from CLI:
CUDA_VISIBLE_DEVICES=0 swift infer \ --ckpt_dir output/got-ocr2/vx-xxx/checkpoint-xxx \ --load_dataset_config true \
One more thing, what are the additional arguments that I could pass in the above command? How could I test the model on a large test set, instead of manually feeding each input to the model?
I would thankful to you, if you could answer as soon as possible.
Inference:
fine-tuning:
inference after fine-tuning