hello, thanks for creating this library. I am trying to reproduce the results for bert on i2b2 2010,2012 and n2c2 2018. However, I have trouble converting these dataset into the conll-2003 txt file shown in test_data. I assume the preprocessing script are different for each dataset because i2b2 2010 (txt, con) and 2012 (txt, extent, tlink) have different file extension.
Is it possible to release the preprocessing scripts for easier reproducibility?
i2b2 2010 dataset - the currently released data is not the one originally released during the challenge. We used the dataset preprocessed by our collaborator (we do not have access to their source code)
i2b2 2012 dataset - you can convert the released data to brat format (https://brat.nlplab.org/standoff.html) then you can follow our tutorial to convert the brat format to BIO format.
hello, thanks for creating this library. I am trying to reproduce the results for bert on i2b2 2010,2012 and n2c2 2018. However, I have trouble converting these dataset into the conll-2003 txt file shown in test_data. I assume the preprocessing script are different for each dataset because i2b2 2010 (txt, con) and 2012 (txt, extent, tlink) have different file extension.
Is it possible to release the preprocessing scripts for easier reproducibility?