Open jambo6 opened 3 years ago
should also have snorkel_labels_train.xlsx to go along with its test and dev files. Does this exist and if so is there any chance of getting access?
So this folder only contains sentences that were manually hand labeled for this project. The train version isn't available as it is supposes to consist of all the remaining documents within Pubtator. The following output would be too big of a file for github to host on their LFS (max file is 2GB).
Currently, the main way to get those sentences is to download a snapshot of pubtator central and extract those sentences into a database. Otherwise I have a snapshot of the database used for this project that you could import (118GB); however, would need to figure out how to transport that large of a file. Overall recommendation is to use the first option as you would have the most current version for whichever project you are going to work on.
I was after the hand labelled train/dev/test sentences to bolster my dataset for a similar RE project, not the entire pubtator db. Would it be okay for me to use these and if so, is there a straightforward method to download just these sentences with hand labellings?
I was after the hand labelled train/dev/test sentences to bolster my dataset for a similar RE project, not the entire pubtator db. Would it be okay for me to use these and if so, is there a straightforward method to download just these sentences with hand labellings?
Sure. Can't guarantee that train.xlsx exists or has a lot of sentences annotated but here are the quick links to the available data atm:
Compound Treats Disease Train Compound Treats Disease Dev Compound Treats Disease Test
Disease Associates Gene Dev Disease Associates Gene Test
Gene interacts Gene Train Gene interacts Gene Dev Gene interacts Gene Test
Compound binds Gene would take a bit for me to get to you so if you need that let me know.
So do there not exist handcrafted labels for Disease Associates Gene Train
?
I forgot to upload onto this repository, but here is your request file: Disease Associates Gene Train
I'd like to utilise these labels for another project. It seems the folder
should also have
snorkel_labels_train.xlsx
to go along with itstest
anddev
files. Does this exist and if so is there any chance of getting access?