redotvideo / mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
Apache License 2.0
878 stars 68 forks source link

Question about how mamba chat training is done #35

Open aravindkoti opened 1 month ago

aravindkoti commented 1 month ago

Hi,

First off, thanks for providing training code for mamba use cases.

I was looking at how the training for mamba-chat is done, something I'm unclear on is the "preprocess" function used in the class "ChatDataset" (in "/trainer/data.py"). Why does it return a dictionary with only the input ids and not labels

dict(input_ids = all_input_ids, labels=all_input_ids)

I'm a little confused, wouldn't we need the data of both the user and assistant to train a chatbot? I notice this same pattern a few other times in the training code so I wanted to ask

Thanks!