kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
952 stars 152 forks source link

Mask only support padding mask #122

Open Wendysigh opened 2 months ago

Wendysigh commented 2 months ago

I encounter an issue using crf layer when using a random mask, the loss becomes negative after several rounds. And I found this is due to the definition in https://github.com/kmkurn/pytorch-crf/blob/623e3402d00a2728e99d6e8486010d67c754267b/torchcrf/__init__.py#L203.

The code works only when the mask is padding mask. When the mask is a random mask, maybe we need use a function defined as below:

def find_last_nonzero_indices(matrix):
    matrix = matrix.T
    non_zero_mask = matrix != 0
    # Convert boolean mask to indices where True (non-zero elements)
    non_zero_indices = torch.where(non_zero_mask, torch.arange(matrix.size(1), device=matrix.device), -1)
    last_nonzero_indices = torch.max(non_zero_indices, dim=1).values
    return last_nonzero_indices

seq_ends = find_last_nonzero_indices(mask)
kmkurn commented 2 months ago

Yes you're right. The mask is never intended to be anything other than a padding mask. A random mask for a sequence tagging problem doesn't seem standard nor trivial, so I'm less inclined to implement it in the library. But I'm happy to be proven wrong! :-)

Wendysigh commented 2 months ago

Thanks for the reply! I am new to sequence tagging and accidentally want to mask less important tokens. Somehow I want the model to focus on other important labels. That's the initial start for this issue. True as you said, maybe it is not a standard trial lol