Open sven-nm opened 3 months ago
@sven-nm
Hi there, I apologize for the delayed response - I totally missed the notification for this issue.
Unfortunately it has been a long time since I wrote this code, so I can't answer very confidently, but it looks like you're right. That said, I double checked and this logic is used in all three masking methods (including the one we trained with) and if our masking was genuinely completely broken, I'm not sure how the model would have learned anything. Given that, I am... not sure what to tell you, to be honest.
Edit: If you come to a conclusion you are confident about and fix something please feel free to open a PR though.
@sven-nm does this affect training? We are planning to start training on our English corpus and I was wondering if this bug has a major effect on the training loss
@ganeshkrishnan1 If this is actually broken it should pretty much break training entirely (which may or may not be reflected in the loss), but it should be pretty easy to figure out if there's a problem or not, and if it is an issue it would be very easy to fix - it would just be the change suggested in the original issue.
I haven't started the training for this yet. But, how do you figure out if there is a problem if it's not evident with the loss? Do you mean run it on live world classification (or similar task) scenarios?
@ganeshkrishnan1 Easiest ways are either to step through the training code with a debugger and just look at what is actually being masked, or if you prefer, train on an extremely small simplified dataset (like strings of consecutive letters or something) and see if the model can learn that properly.
@ganeshkrishnan1 Btw, if you do figure this out one way or the other please let me know. I can update the code in the main branch (or you can open a PR) if changes are necessary.
Our team is going to start testing work on this early next week. I will keep you updated
Hey guys thanks for this awesome adaptation of CANINE 😊 I've been working on adapting for any language and I came across weird empty masks. I think the problem is in
training/masking.py
in the functionrandom_mask
. We have the following (starting at line 42):I've added my comments with a ⚠️. Am I missing something here ? My hunch is that line 43 should be
special_tokens_mask = special_tokens_mask | ~attention_mask.bool()
.