Open hahuyhoang411 opened 6 days ago
Structure:
Title
Abstract
1. Introduction
1. Multimodal
2. Early fusion
3. Latency
2. Related works
3. Model Architecture
1. Architecture: TypeD
2. Tokenization: WhisperVQ
4. Pre-training
1. Data Source: Multilingual 7langs
2. Training Technique: Stabilize training
3. Training Stages and Hyper-parameters
5. Post-training
1. Instruction data
1. Data Format
2. Data Mixture: Tackle the catastrophic forgetting + recovering knowledge
3. Training Stages and Hyper-parameters
6. Evaluation
1. Text benchmarks
2. Audio benchmarks
7. Conclusion
Appendix:
1. Inference
2. Failed experiments
Key:
1. State out key points
2. Takeaways?
Task 1: table of paper Description: gather related paper for reference https://www.notion.so/jan-ai/748e90f9a29a4a49b49cc07ebf4bc03a?v=d245885fa1c24492837cd7b6439709e6
Task 2: Introduction: https://www.notion.so/jan-ai/Ichigo-Paper-0f9b351e9bfe4517be816bbf4c4d6cbd
Goal
Release an academic paper for our effort of training the sound modality.
Description
We are pushing out the paper for claim our result for a better position in the research community. We are one of the very first team to do sound model using
Tokenized Early Fusion
.Tasklist