How to output only one result file during prediction？

Yes, when you have N training datasets, there will be N output files corresponding to the N datasets. This is because we are doing multi-task learning with each dataset as a task. Note that these N output files may have conflicts (e.g., the same token may be predicted as S-GENE in output 1 but S-CHEMICAL in output 2). Outputting only 1 file (with conflicts resolved) is beyond the scope of this project.

Merging all training sets into one cannot work because it will introduce lots of false-negative training samples. For example, the first training set may only have GENE entities, then all CHEMICAL entities in the first training set will be labeled as "O".

To achieve the goal you are expecting, as far as I know, you may refer to the following paper:

Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets. paper: https://aclanthology.org/D18-1306.pdf code: https://github.com/ngreenberg/em-crf

yuzhimanhua / Multi-BioNER

How to output only one result file during prediction？ #22