knmnyn / ParsCit

An open-source CRF Reference String Parsing Package
http://wing.comp.nus.edu.sg/parsCit
GNU Lesser General Public License v3.0
155 stars 47 forks source link

Re: SectLabel (emailed issue) #7

Closed knmnyn closed 12 years ago

knmnyn commented 12 years ago

My name is XXX. I'm currently doing my Master's in Computer Science by the UFRGS University here in Brazil. I've just read this very nice enclosed paper and I think that it could be really useful for my final thesis. Initially, my plan is to evaluate this tool with my current test set papers. Finally, I want to extend this project to go one level up and bring the article metadata like Title, Authors (filiation, email) and so on and maybe some other tests specifically for PDF papers...

I'd like to kindly ask you for details how to get this tool. I just found the ParsCit website with other features included and maybe you have the SectLabel source code also including PDF parsing... I saw you used OmniPage OCR engine, I don't have the license yet, but no problem to acquire if the case!

I'm looking forward to hearing news from you and hope to be of good help on this such nice project.

Kind Regards, XXX

knmnyn commented 12 years ago

Thanks for writing us. We appreciate your interest on ParsCit and SectLabel. SectLabel is actually module within ParsCit, so it is invokable within ParsCit. A few modes of operation of ParsCit include SectLabel output.

If you download ParsCit from our webpage or from github, you'll have the SectLabel module inside of ParsCit. If you want to modify how SectLabel works, you'll need to look at the code and the training data (provided in the distribution) for SectLabel.

The documentation for SectLabel isn't that well done, since it's an internal subcomponent of ParsCit, but please ask us if you get stuck.

We're actually developing a portion to better extract names and affiliations and matching them together in our lab, but it's not public yet. We haven't really worked on the email extraction and matching as much, but it's an interesting idea that you have.

You might look at our SectLabel paper, if you haven't already.

http://wing.comp.nus.edu.sg/parsCit/ijdls-SectLabel.pdf

Cheers,

Min