关于rlhf中source_max_length和target_max_len

lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Apache License 2.0

3.48k stars 291 forks source link

Open jiahuanluo opened 1 year ago

jiahuanluo commented 1 year ago

在 qlora_dpo.py中，看到对chosen 进行 max_length=self.source_max_len 的tokenize，对rejected进行max_length=self.target_max_len的tokenize，为什么呢？ https://github.com/lyogavin/Anima/blob/dc691b2958f50a6d73a239b0e13c341ce6b2d60f/rlhf/qlora_dpo.py#L491 我们以为source_max_len是指instruction + query 的lenth，target_max_len是response的length

lyogavin commented 1 year ago

是的这两个参数名有点confusing，我回头改一下

jiahuanluo commented 1 year ago

另外，我看到计算loss的时候，是把prompt + query + response都算上了，为什么呢？我们以为只要算response的loss，。

lyogavin commented 1 year ago

另外，我看到计算loss的时候，是把prompt + query + response都算上了，为什么呢？我们以为只要算response的loss，。

这个大部分情况对performance影响不大。回头我可以加一个参数可以disable instruction。