moment-timeseries-foundation-model / moment

MOMENT: A Family of Open Time-series Foundation Models
MIT License
319 stars 51 forks source link

Non-compatibility of the classification head for gpu on zeroshot classification task #33

Open sandhyat opened 3 months ago

sandhyat commented 3 months ago


Thank you for providing the code for your work. When using a pre-trained Moment for zero-shot or fine-tuning classification, my code is erroring out with trace pointing to tensors being on two devices. I confirmed that the inputs are on cuda. I found the exact lines (66 and 68 in ) where this is happening. If I move this linear layer to 'cuda' device explicitly then the code works fine. Following is a code snippet that I have been using.

model = MOMENTPipeline.from_pretrained(
        'task_name': 'classification',
        'n_channels': 69,
        'num_class': 2
    },  # We are loading the model in classification mode

def get_logits(model, dataloader):
    logits_list = []
    with torch.no_grad():
        for batch_x, batch_masks, _ in tqdm(dataloader, total=len(dataloader)):
            batch_x ="cuda").float()
            batch_masks ="cuda")

            output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]
            logit = output.logits
    logits_list = np.concatenate(logits_list)
    return logits_list

output_flow_logit_test = get_logits(model, dataloader_flow_test)
Loading data... > /pre_wkdir/
-> for batch_x, batch_masks, _ in tqdm(dataloader, total=len(dataloader)):
(Pdb) n
  0%|                                                                                                                                                                                       | 0/19 [00:00<?, ?it/s]
> /pre_wkdir/
-> batch_x ="cuda").float()
(Pdb) n
> /pre_wkdir/
-> batch_masks ="cuda")
(Pdb) n
> /pre_wkdir/
-> output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]
(Pdb) batch_x.is_cuda
(Pdb) batch_masks.is_cuda
(Pdb) n
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
> /pre_wkdir/
-> output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]

I would appreciate it if you could comment on this based on your experience of your code development.

Thanks, Sandhya