junxnone / Eureka

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models"
https://eureka-research.github.io/
MIT License
0 stars 0 forks source link

Sequence #5

Open junxnone opened 11 months ago

junxnone commented 11 months ago

sequenceDiagram
    participant Eureka
    participant GPT4 as LLMs GPT4
    participant Gym as Isaac Gym

    loop Interations: Get best reward
        loop Samples: Generate reward
        note over Eureka: Build Prompt
        Eureka->>GPT4: Query reward Sample code
        GPT4 -->> Eureka: Return Reward code
        end

        loop Samples: Training with generated reward
        note over Eureka: Build RL Envs
        Eureka ->> Gym: Training with reward code
        Gym -->> Eureka: Return Results
        end

        note over Eureka: Update the best reward
        note over Eureka: Rebuild Prompt
    end

    Eureka->> Gym: Evaluate with best reward
    note over Gym: Generate the last weights
junxnone commented 11 months ago

flowchart TB
    subgraph S1[Eureka iteration]
    subgraph S2[Generate Sample]
        A(Build Prompt)
        C[Build RL Envs]
    end
        B[LLMs GPT4]
        D[Analysis Results]
    end
    A -->|OpenAI API| B
    B-.->|Reward Function| C
    subgraph S3[IsaacGym]
        S3B[Evaluate]
        S3A[Training]
    end
    C --> |Start Training| S3A 
    S3A-.-> | Return Results | D
    D --> |Evaluate the best reward code| S3B
    D -.-> A