shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

大佬,rl_training.py中的prompt文字为啥只取偶数位的 #219

Closed laiqinghan closed 1 year ago

laiqinghan commented 1 year ago

Describe the Question

Please provide a clear and concise description of what the question is.

    for prompt in get_prompt(examples):
        for i in range(len(prompt) // 2):
            **source_txt = prompt[2 * i]**
            tokenized_question = tokenizer(
                source_txt, truncation=True, max_length=max_source_length, padding="max_length",
                return_tensors="pt"
            )
            new_examples["query"].append(source_txt)
            new_examples["input_ids"].append(tokenized_question["input_ids"])
laiqinghan commented 1 year ago
image
shibing624 commented 1 year ago

取的是query,格式是: 0 问 1 答 2 问 3 答

所以去偶数的。