homebrewltd / ichigo

Llama3.1 learns to Listen
151 stars 5 forks source link

epic: Paper for Ichigo model #74

Open hahuyhoang411 opened 6 days ago

hahuyhoang411 commented 6 days ago

Goal

Release an academic paper for our effort of training the sound modality.

Description

We are pushing out the paper for claim our result for a better position in the research community. We are one of the very first team to do sound model using Tokenized Early Fusion.

Tasklist

hahuyhoang411 commented 6 days ago

Structure:

Title
Abstract
1. Introduction
    1. Multimodal
    2. Early fusion
    3. Latency
2. Related works 
3. Model Architecture
    1. Architecture: TypeD
    2. Tokenization: WhisperVQ
4. Pre-training
    1. Data Source: Multilingual 7langs
    2. Training Technique: Stabilize training
    3. Training Stages and Hyper-parameters
5. Post-training
    1. Instruction data
        1. Data Format
    2. Data Mixture: Tackle the catastrophic forgetting + recovering knowledge
    3. Training Stages and Hyper-parameters
6. Evaluation
    1. Text benchmarks
    2. Audio benchmarks
7. Conclusion

Appendix: 
    1. Inference
    2. Failed experiments

Key:
    1. State out key points
    2. Takeaways?
hahuyhoang411 commented 2 days ago

Task 1: table of paper Description: gather related paper for reference https://www.notion.so/jan-ai/748e90f9a29a4a49b49cc07ebf4bc03a?v=d245885fa1c24492837cd7b6439709e6

hahuyhoang411 commented 1 day ago

Task 2: Introduction: https://www.notion.so/jan-ai/Ichigo-Paper-0f9b351e9bfe4517be816bbf4c4d6cbd