data format for labels - Githubissues

Hi Gabriben,

Apologies, only just saw your message.

There are two folders, labels and text.

the "text" contains files that have PubMed Abstracts, split one sentence per line (already tokenized). The file names are the PubMed IDs.

The "labels" contains corresponding labels for each text file (both will be named with the same PubMed ID). The file format is as follows: they contain multiple labels per sentence.

The are sentence labels are separated by "<", and the multi-labels for each sentence is separated by "AND".

Hope that helps. let me know otherwise.

sb895 / Hallmarks-of-Cancer

data format for labels #1