knmnyn / ParsCit

An open-source CRF Reference String Parsing Package
http://wing.comp.nus.edu.sg/parsCit
GNU Lesser General Public License v3.0
155 stars 47 forks source link

Author name order in references (first name + last name) #18

Closed jgrossi closed 9 years ago

jgrossi commented 9 years ago

Hi there! Here I'm again ;-)

Extracting references from PDF papers I have a fixed author name format like M Delio or DM McDonald-McGinn. Is there any way to set the order of extraction like last_name and after first_name or the opposite? Is this customizable?

Thanks! Regards.

knmnyn commented 9 years ago

Hi there, thanks for your interest. Unfortunately, no there's no direct way to customize ParsCit to do this. We'd recommend two ways to get what you want: 1) use a post-process to enforce the constraints that you have in your dataset, 2) or if you have more time, you can re-train ParsCit by changing the training data to change the <author> tag into <author_last> and <author_first> tags.

Hope that helps!

jgrossi commented 9 years ago

Thanks for the response @knmnyn! I think is easier to do a post-process and change when the first name came first.

Thank you. Regards.