haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.2k stars 1.99k forks source link

[Question] Is firefly multi-dialog training method used in LLaVA? #685

Open CthulhuAIFrenzy opened 8 months ago

CthulhuAIFrenzy commented 8 months ago

Question

?Is firefly multi-dialog training method used in LLaVA? current is the usage?

def _mask_targets(target, tokenized_lens, speakers):
    # cur_idx = 0
    cur_idx = tokenized_lens[0]
    tokenized_lens = tokenized_lens[1:]
    target[:cur_idx] = IGNORE_INDEX
    for tokenized_len, speaker in zip(tokenized_lens, speakers):
        if speaker == "human":
            target[cur_idx+2:cur_idx + tokenized_len] = IGNORE_INDEX
        cur_idx += tokenized_len
fisher75 commented 3 months ago

I wish they could add multi-dialog feature in SGLang for batch inference.