microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

[tuna] Libraries are conflicting and/or very aged #171

Open batawfic opened 7 months ago

batawfic commented 7 months ago

So disappointed of what is released here. these are just non working pieces. Funny that in train.py for example you have: from custom import CustomTrainer, but custom is actually have only TunaTrainer. also where in the code gpt_eval is called, the README never described . Environment and library installation is another joke!

I'm sure that no one of author will read this comments. So waste of my 3 days spent here

donglixp commented 7 months ago

@batawfic Could you be more specific about which project you were trying to fix?

donglixp commented 7 months ago

I just searched TunaTrainer and found this folder https://github.com/microsoft/LMOps/tree/main/tuna .

@XingxingZhang and @haorannlp can help with this issue.

haorannlp commented 7 months ago

So disappointed of what is released here. these are just non working pieces. Funny that in train.py for example you have: from custom import CustomTrainer, but custom is actually have only TunaTrainer. also where in the code gpt_eval is called, the README never described . Environment and library installation is another joke!

I'm sure that no one of author will read this comments. So waste of my 3 days spent here

Got it, I will look into this today.

haorannlp commented 7 months ago

So disappointed of what is released here. these are just non working pieces. Funny that in train.py for example you have: from custom import CustomTrainer, but custom is actually have only TunaTrainer. also where in the code gpt_eval is called, the README never described . Environment and library installation is another joke!

I'm sure that no one of author will read this comments. So waste of my 3 days spent here

For train.py, I've removed the from custom import CustomTrainer line as it does not affect the training process. I forgot to clean this script at the first commit, sorry for the confusion. train.py is used for Supervised finetuning (SFT), which is borrowed from https://github.com/AetherCortex/Llama-X, please refer to Llama-X repo for a more comprehensive explanation/discussion. train_tuna.py is used for learning from the rankings. gpt_eval.py is used for querying GPT-4 models for generating contextual ranking data. This script was only for illustration purpose and was not called in this repo. We've provided the GPT-4 ranking data in ./gpt_data folder.

For python environment installation, could you be more specific on what problems/errors you've encountered so that I can guide you through this installation process. Alternatively, you can search in Llama-X repo to see if there are similar issues if we are not able to respond promptly.

Thanks.

batawfic commented 7 months ago

So disappointed of what is released here. these are just non working pieces. Funny that in train.py for example you have: from custom import CustomTrainer, but custom is actually have only TunaTrainer. also where in the code gpt_eval is called, the README never described . Environment and library installation is another joke! I'm sure that no one of author will read this comments. So waste of my 3 days spent here

For train.py, I've removed the from custom import CustomTrainer line as it does not affect the training process. I forgot to clean this script at the first commit, sorry for the confusion. train.py is used for Supervised finetuning (SFT), which is borrowed from https://github.com/AetherCortex/Llama-X, please refer to Llama-X repo for a more comprehensive explanation/discussion. train_tuna.py is used for learning from the rankings. gpt_eval.py is used for querying GPT-4 models for generating contextual ranking data. This script was only for illustration purpose and was not called in this repo. We've provided the GPT-4 ranking data in ./gpt_data folder.

For python environment installation, could you be more specific on what problems/errors you've encountered so that I can guide you through this installation process. Alternatively, you can search in Llama-X repo to see if there are similar issues if we are not able to respond promptly.

Thanks.

@haorannlp Thanks for getting back to me. I honestly wasn't expecting that. Here is summary of some of the issue: raw_dataset = load_dataset("json", data_files=data_args.data_path, split="train") <-- the data is list of JSON not JSON with key word train, to fix that I had to modify the code and install datasets=2.10, pyarrow==15 and instruct the code to read jsonl not json

Also I have to upgrade deepspeed==0.13

After all this when running and read the data it hangs for ever. I'm unclear what is the issue

also gpt_eval never run, below line in gpt_eval doesn't work with error: OSError: source code not available if name == "main": fire.Fire(GroupEval)

I gave up honestly to get the examples working. I was using databricks A100 1 GPU with 1 Node. and Python 3.10. Please note that requirement for Llama-X: # CUDA 11.6 conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge are very old and I was unclear if updating these will break anything