Open ghost opened 3 years ago
Hi, I just pushed a corner case handling (commit e0d9aee90cd774fff3cb244701cbcf323359f4ee) which may be related to your problem. In some cases (especially strings containing only control characters) the tokenizer we are using maps tokens to empty sequences, which could lead to zero divisions (and NaN values) down the road. Can you please check if the commit fixes your problem? If so, it would still be better to remove any control characters from your dataset beforehand.
If this does not fix your isuse, could you please send me the dataset (or a representative part of it) by email (markus.eberts@hs-rm.de)? I can have a look at it then.
Thank you, I'll check asap and let you know
Hi @markus-eberts , thanks for sharing your great work.
I was playing around a variation of spERT, where the relations where extracted using a softmax instead of a sigmoid. To ensure the correctness of the overall system I trained it with the version of conll04 that you provided with the model and everything seemed fine. The issues arose when trying to train it with a different dataset, converted to a format compatible with spERT. Train went smoothly, but the model didn't make any prediction at all, be it an entity or relation. I am for sure missing something, I was wondering if you could maybe provide to me a direction from which start to work.
Here is a single sample from the training dataset:
On this dataset the softmax is recommended since all the relations are symmetrical and between two entities exists only a single relation.
Here is the log of the training run:
The following are the major changes that I applied to the original model: spert/spert_trainer.py
spert/loss.py
spert/sampling.py
spert/models.py
spert/predictions.py
spert/predictions.py
Not related to the previous topic, thought I'd add it here since the same dataset is involved. During the experimentation with the original spERT I changed bert to scibert. Using 1 epoch of training I had no issues whatsoever, when I increased them to 5 the procedure to store the predictions started to pick up relations that should instead be filtered out by previous elaboration (if I interpreted everything correctly). Here is the log
Best regards