NotImplementedError running HF model "mlfoundations/dclm-7b-it" for inference

I am trying to use the HF model "mlfoundations/dclm-7b-it" for inference, simply using the code below:

model = AutoModelForCausalLM.from_pretrained("mlfoundations/dclm-7b-it")
gen_kwargs = {"max_new_tokens": 500, "temperature": 0}
output = model.generate(inputs['input_ids'], **gen_kwargs)

I see this warning when loading the model: Some weights of OpenLMForCausalLM were not initialized from the model checkpoint at mlfoundations/dclm-7b-it and are newly initialized: [...]

And I get NotImplementedError:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 3, 32, 128) (torch.float32)
     key         : shape=(1, 3, 32, 128) (torch.float32)
     value       : shape=(1, 3, 32, 128) (torch.float32)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
     p           : 0.0

I have also tried model = AutoModel.from_pretrained("mlfoundations/dclm-7b-it"), but this model class also fails with ValueError: Unrecognized configuration class.

Which model class should I use here?

mlfoundations / open_lm

NotImplementedError running HF model "mlfoundations/dclm-7b-it" for inference #303