Is there any pre-trained model data or recommendation for the distilled qwen model? The effect has been verified on the llama model, but the performance on qwen is very poor. I suspect that the problem is the pre-trained data.
I think the dolly dataset is enough for qwen. For more high-quality data, you can try sharegpt. Note: Qwen's tokenization is much larger than LLaMA, which means the processed data should be stored in int32, not uint16 as in the codebase.
Is there any pre-trained model data or recommendation for the distilled qwen model? The effect has been verified on the llama model, but the performance on qwen is very poor. I suspect that the problem is the pre-trained data.