stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
https://levanter.readthedocs.io/en/latest/
Apache License 2.0
517 stars 82 forks source link

use hf config from checkpoint by default #715

Closed dlwh closed 2 months ago

dlwh commented 2 months ago

So we should actually respect the config in the HF checkpoint directory since that's usually right. A bunch of tests were broken, but we didn't notice because they all need torch and we don't install torch in CI... (I guess we should)

Fixes #681 ( I think)

mhmaqbool commented 2 months ago

It would be great if a sample configuration for mistral-7b is made available for loading from a local checkpoint. I still note that HF is referred to for loading the checkpoint, but probably that's something to do with my configuration. What did I try? In the configuration (mistral-7b),

  1. initialize_from: /path/to/local/checkpoint that results in the following error, draccus.utils.DecodingError: The fields 'initialize_from' are not valid for TrainLmConfig

  2. `initialize_from_hf:/path/to/local/checkpoint' tries to find the checkpoint from HF

Any advise please?

dlwh commented 2 months ago

does /path/to/local/checkpoint/model.safetensors.index.json not exist?

On Mon, Sep 9, 2024 at 10:16 AM mhmaqbool @.***> wrote:

It would be great if a sample configuration for mistral-7b is made available for loading from a local checkpoint. I still note that HF is referred to for loading the checkpoint, but probably that's something to do with my configuration. What did I try? In the configuration (mistral-7b), 1. initialize_from: /path/to/local/checkpoint that results in the following error, draccus.utils.DecodingError: The fields 'initialize_from' are not valid for TrainLmConfig 2. `initialize_from_hf:/path/to/local/checkpoint' tries to find the checkpoint from HF

Any advise please?

— Reply to this email directly, view it on GitHub https://github.com/stanford-crfm/levanter/pull/715#issuecomment-2338652965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIM44HJOA72DQ4PP4NDZVXJWTAVCNFSM6AAAAABNSXKAE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGY2TEOJWGU . You are receiving this because you modified the open/close state.Message ID: @.***>

mhmaqbool commented 2 months ago

Ahhh, the checkpoint I have doesn't contain model.safetensors.index.json rather it's a plane pytorch checkpoint. I assumed there might be a conversion procedure in place. Could you please point to the checkpoint mistral expects locally if possible?

dlwh commented 2 months ago

Pytorch should work too if pytorch is installed and "pytorch_model.bin.index.json" or "pytorch_model.bin" exists

On Mon, Sep 9, 2024 at 10:36 AM mhmaqbool @.***> wrote:

Ahhh, the checkpoint I have doesn't contain model.safetensors.index.json rather it's a plane pytorch checkpoint. I assumed there might be a conversion procedure in place. Could you please point to the checkpoint mistral expects locally if possible?

— Reply to this email directly, view it on GitHub https://github.com/stanford-crfm/levanter/pull/715#issuecomment-2338688168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIO7LU4GVV2DVXAMJ5DZVXMAHAVCNFSM6AAAAABNSXKAE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGY4DQMJWHA . You are receiving this because you modified the open/close state.Message ID: @.***>

mhmaqbool commented 2 months ago

Yeah, I have PyTorch installed and tried again. Following files are available in the checkpoint

Still tries to refer to HF website for getting the checkpoint? This is my configuration,

` data: train_urls: ["/pth/to/train/json"] validation_urls: ["/pth/to/val/json/"] tokenizer: "/local/mistral-7b/checkpoint" model: type: mistral initialize_from_hf: "/local/mistral-7b/checkpoint" use_hf_model_config: true trainer: wandb: project: "levanter" tags: ["openwebtext", "mistral"]

mp: p=f32,c=bfloat16 train_batch_size: 256 # set for v4-64 TPU num_train_steps: 1000 steps_per_eval: 50 tensor_parallel_axes: ["mlp", "heads"] fsdp_axis: "embed" batch_axis: "batch" optimizer: learning_rate: 1.2E-5 # set low for fine-tuning weight_decay: 0.1 min_lr_ratio: 0.1 `