jameshowison / softcite

Study of software citation in the biology literature
5 stars 5 forks source link

add two versions of pdftotext output #5

Closed yg4886 closed 9 years ago

yg4886 commented 9 years ago

I have added the pdftotext output. The txt_original folder is the original output without any processing. For the txt_sentences folder, I split the text into sentences roughly using NLTK tokenizor.

jameshowison commented 9 years ago

Awesome. Thanks! I'll get the codes into some of those.

yg4886 commented 9 years ago

Thanks.

On Nov 14, 2014, at 10:48 AM, James Howison notifications@github.com wrote:

Awesome. Thanks! I'll get the codes into some of those.

— Reply to this email directly or view it on GitHub https://github.com/jameshowison/softcite/pull/5#issuecomment-63093243.