avoid collecting the same position in the text when the note label is the same. So for example if we have This note1, and this note2, but back to the first note1, we would collect twice the offset of the first 1 label.
update the labels2notes so that we use the identifier instead.
fix a NPE when the note tokenization are empty, the processShort return a null, so we skip such note
coverage: 40.791% (+0.004%) from 40.787%
when pulling 2f9e2114e05ad44740ab2e071e9345ae31f282f7 on bugfix/notes-same-label
into 694f0ed055e8c9a5d5efdc314ebef78e5e2640cf on master.
coverage: 40.792% (+0.005%) from 40.787%
when pulling a472cb1b80961a482a9710eeffc7ce2b93844854 on bugfix/notes-same-label
into 694f0ed055e8c9a5d5efdc314ebef78e5e2640cf on master.
More details are in #1113.
TLDR:
This note1, and this note2, but back to the first note1
, we would collect twice the offset of the first 1 label.