lasigeBioTM / IHP

Identification of Human Phenotype Entities
MIT License
10 stars 6 forks source link

GSC_v2 vs. GSC+ #1

Open ghost opened 5 years ago

ghost commented 5 years ago

Hello, What is the difference between GSC_v2 and GSC+. Both annotations folders seems to be the same?

dpavot commented 5 years ago

The original author created both versions (not sure for what purpose, maybe GSC_v2 is a older version of GSC+ that wasn't properly replaced) but the one used to build and run the models is GSC+.

ghost commented 5 years ago

Thanks @dpavot . I looked at the files and they seem the same. I thought that the v2 version might be the original Bio-LarK set, but as you said it's not.

Does the training and testing of the model done on GSC+? Which code split the samples to train/test?

dpavot commented 5 years ago

The training and testing was done on GSC+. When I added the Dockerfile option to the repo I chose to randomly divide the corpus 30/70 as it was reported in the paper in order to try to replicate the results. Originally, I do not know which code served this purpose. I hope to be able to give you more clear guidelines, soon as we have an answer from the first author.