knmnyn / ParsCit

An open-source CRF Reference String Parsing Package
http://wing.comp.nus.edu.sg/parsCit
GNU Lesser General Public License v3.0
155 stars 47 forks source link

About the dataset #32

Closed yassouali closed 5 years ago

yassouali commented 5 years ago

Hi, sorry if this is not a good way to ask for this

First of all thank you for the amazing work.

I am currently working on document segmentation, and I'd like to use your dataset 'sectLabel' for training our model, the problem is I am finding a hard time finding the images/PDFs and their annotations.

Thank

cmkumar87 commented 5 years ago

Hi @darkmythos thanks! SectLabel dataset and systems are about 10 years old now. We can get back to you. Could you email to kanmy@comp.nus.edu.sg, animesh@comp.nus.edu.sg and CC me, muthu.chandra@comp.nus.edu.sg? Thanks!

knmnyn commented 5 years ago

Hi @darkmythos you can also use the source training data in CRFPP format which is found at https://github.com/knmnyn/ParsCit/tree/master/doc , look for SectLabel.tagged.txt and SectLabelXML.tagged.txt . Hope that helps!