knmnyn / ParsCit

An open-source CRF Reference String Parsing Package
http://wing.comp.nus.edu.sg/parsCit
GNU Lesser General Public License v3.0
155 stars 47 forks source link

Language Option, Split Author Names Option, more UTF8, ... #21

Open maboberlin opened 8 years ago

maboberlin commented 8 years ago

All changes done for the program 'parseRefStrings.pl'. Language Option: Language Parameter can be passed to 'parseRefStrings.pl'. New Modul 'ConfigLang' for language data and initialization. Done for German and English. Split author: If author names stuck together without whitespace (for example: T.W.Adorno), name will be seperated. Keep Files: Parameter introduced so that Temp Files are kept. Output-File: Output file bug fixed. Output file is now kept. UTF-8: Changed encoding at several places to UTF-8. CRF++ Data: 'prepData.pl' to produce CRF++ compatible Data from reference strings.

knmnyn commented 8 years ago

Hi @matzeBo thanks for the PR. We are reviewing it 🔜 . Due to our semester schedule, we have yet to have time to review your PR , but will get to it ASAP.

maboberlin commented 8 years ago

Ok, Min! I almost finished my thesis work, so the next PR could be in pipe soon! Hope the current one is all right ... Thank you and your team for managing this!

kishaloyhalder commented 8 years ago

Hi @matzeBo , thanks for the PR. I have checked out the same. There are changes in 6 files in total. However seems like there are some issues with the changes in PostProcess.pm. I am not able to run basic commands like extract_all.

_$ ./citeExtract.pl -m extractall ../demodata/sample2.txt Type of arg 1 to push must be array (not private variable) at ../lib/ParsCit/PostProcess.pm line 192, near "};" Compilation failed in require at ../lib/ParsCit/Controller.pm line 22. BEGIN failed--compilation aborted at ../lib/ParsCit/Controller.pm line 22. Compilation failed in require at ./citeExtract.pl line 47. BEGIN failed--compilation aborted at ./citeExtract.pl line 47.

I have checked with the existing master to make sure my installation etc is correct and that is working fine. I have tried hacks from this (http://www.perlmonks.org/?node_id=1098524) and have checked with different versions (5.8, 5.20) of perl without any luck. Could you please double check once and if possible confirm this PR is broken. In case I am missing something trivial let me know. Thanks again,

knmnyn commented 8 years ago

@matzeBo Have you had a chance to read @kishaloyhalder 's review of your PR on ParsCit? Thanks!

maboberlin commented 8 years ago

hi @knmnyn and @kishaloyhalder! Sorry that I didn't respond. I read it. I didn't check the integration of 'parseRefString.pl' into the 'ParsCit' package. I guess the problem is that I changed the number of parameters of 'parseRefStrings.pl' or / and of methods in 'PostProcess.pm'. I will fix the problem ASAP ... but at the moment I am very busy...

maboberlin commented 7 years ago

Hi! Sorry for the very late reply!

I think the problem occured because of perl-version issues. The command in PostProcess.pm line 192 tries to push an array into a reference of an array. This is not possible in perl versions prior to 5.14. (See here: https://www.effectiveperlprogramming.com/2010/11/use-array-references-with-the-array-operators/).

I tested to command in different perl-versions using perlbrew. It throws the described error if I use a version prior to 5.14. How did you manage the version switch, @kishaloyhalder?

To solve the problem I dereferenced the list-reference in the command, line 192. So it should work now also for lower versions. Try the latest commit. On my system everything (citeExtract.pl demo-command, etc.) works fine!