lucidrains / CALM-pytorch

Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
MIT License
170 stars 9 forks source link

Possible to load huggingface's pretrained models in anchor_llm & augment_llm? #6

Open prashantkodali opened 8 months ago

prashantkodali commented 8 months ago

In the code-snippet below, is it possible to load Decoder/Encoder with pre-trained models from huggingface hub?

augment_llm = TransformerWrapper(
    num_tokens = 20000,
    max_seq_len = 1024,
    attn_layers = Decoder(
        dim = 512,
        depth = 12,
        heads = 8
    )
)

anchor_llm = TransformerWrapper(
    num_tokens = 20000,
    max_seq_len = 1024,
    attn_layers = Decoder(
        dim = 512,
        depth = 2,
        heads = 8
    )
)
Mangoho commented 6 months ago

hi, do you solve the problem?

OmarMohammed88 commented 5 months ago

@lucidrains any solution for this issue?

LitterBrother-Xiao commented 1 month ago

@prashantkodali do you find any solutions?

prashantkodali commented 1 month ago

Hello @LitterBrother-Xiao - I implemented this a while back - specific to Encoder based models. I used PyTorch's forward hooks to implement the idea.

The approach didn't work for me - i didnt clean and upload the code - but can share it if it helps you.

Also, authors of the paper released the codebase a couple of months back - https://github.com/google-deepmind/calm. Hope this helps.

LitterBrother-Xiao commented 1 month ago

@prashantkodali thanks so much!