Closed snjie209 closed 4 years ago
Please use "," to separate the labels. For example,
labels \t sentence
0_0,1_0,2_0,3_0,4_0,5_0,6_0,7_0,8_0,9_0 \t Assessment and Plan... <more notes here>
Thanks for the quick response. Are you saying also that we should have four columns in train.tsv?
Also, does each label have to be in “0_1” underscore format? What is this meant to illustrate?
And in your code snippet, are you illustrating one row of data?
Thanks for reading
Okay thanks again. Just to clarify: If I only have a binary classification task, such as 0,1, then I am assuming the format can be
0 \t Assessment and Plan ...
1 \t Prognosis...
Where above I am illustrating two rows of data: the first row with a label of 0, the second row with a label of 1. Also no headers in the above
For binary classification, please use run_bluebert.py
Thanks Yifan. It seems to be running for me now with run_bluebert.py
.
As a note to other readers, it seems that the KeyError
is an issue mainly on the original Google Research BERT github. A lot of folks (ex: https://github.com/google-research/bert/issues/559) filed issues with a similar error, and they had to go into the get_labels
implemented method and change the method. For me, I changed the labels to return ['0', '1'] to fit the labels of my binary classification task in rub_bluebert.py
.
I want to use run_bluebert_multi_labels.py for mimic-iv. I have separated the data into train.tsv and test.tsv. when I run the py file, I receive an error. I want to know how should I feed my labels. now they are like 1sda2,1s6w6,5fef,... it should be in this 1_0,2_0,.. format?
Hi,
I am trying to fine-tune BlueBERT for classifying a set of clinical notes into a binary task. I have set up by
train.tsv
anddev.tsv
files as such:I was not sure whether this is the right format for BlueBERT, but for BERT, it seems that based on the following article: https://blog.insightdatascience.com/using-bert-for-state-of-the-art-pre-training-for-natural-language-processing-1d87142c29e7, the following format is followed for the tsv input data:
However, when I run the following code:
I get the following error:
Looking into
run_bluebert_multi_labels.py
, it seems that thelabel_map
variable is populated based on the entry to thenum_aspects
andaspect_value_list
flag arguments. On line 233 of this Python file, we see thatget_labels
method is used to create thelabel_list
which is then fed intolabel_map
:which is fed into line 277:
The example of what
label_map
keys would then be0_-2
,0_-1
, etc. I printed right before the line of the error ( line 365) and saw thatSo when we run
label_id = label_map[example.label]
, we get a KeyError. So why isexample.label
being fed these underscored keys? Am I missing something here?