I find this project extremely interesting and I'm eager to follow its progress. I have a question regarding the training process mentioned in the paper. The paper refers to "toy shikra/toy model" many times. I'm curious to know how the toy shikra was trained, particularly the results mentioned in Table 2. Was it trained only with REC datasets and initialized from the llama model?
I find this project extremely interesting and I'm eager to follow its progress. I have a question regarding the training process mentioned in the paper. The paper refers to "toy shikra/toy model" many times. I'm curious to know how the toy shikra was trained, particularly the results mentioned in Table 2. Was it trained only with REC datasets and initialized from the llama model?