Closed hantingge closed 4 years ago
Thanks for your interest in our work.
Q1 and Q4: Thanks for your suggestions. During parameter tuning, we wrote some code / changed some parameters and did not change them back in this final version. They will not affect the result or execution in a visible way from my perspective, but we will incorporate your suggestions in the next version.
Q2: It is a sentence-level shuffle.
Q3: This is the common practice in some previous studies. Yes, we have experimented training on just the train.tsv, and the results are lower.
A few other questions:
For the batch sizes of dataloaders, why do you use
10
for training and50
for evaluating devel and test set?In your train_wc.py, you set
shuffle=True
fordataset_loader
for each dataset. So when I shuffle the each dataset's dataloader, does it shuffle only at the document-level or the sentences within each document as well?Why did you merge the
train
anddevel
biomedical dataset for training? Doesn't the model overfit? I assume you have experimented training on just the train.tsv, and the F1s are lower than using merge.tsv on the test set?On line 246 and 251 in train_wc.py, what is the point of
sample_num
and the for loop with range(1)? (if it is always 1)