Closed renaud closed 11 years ago
Hi Renaud:
Great question. Actually a few people have asked about this, so it's good that you opened an issue for it. It's important to modify this, especially if you are using the system for processing other languages.
Currently we identify the beginning of the references section by using the regular expression on ParsCit/PreProcess.pm line 68:
if ($ln_content =~ m/\b(References?|REFERENCES?|Bibliography|BIBLIOGRAPHY|References?\s+and\s+Notes?|References?\s+Cited|REFERENCES?\s+CITED|REFERENCES?\s+AND\s+NOTES?|LITERATURE?\s+CITED?):?\s*$/)
We probably should factor this into a constant so that people can modify.
Oops, sorry, forgot to mention that we do need to limit where references appear so ParsCit does look for the marker (the detection being specified in the line above). If you want to modify it to look for strings wherever, you're welcomed to do that, but it is not the general functionality that most users want (so we wouldn't be incorporating it).
Hope that helps. Closing this issue.
Thank you Min-Yen, that helps a lot!
I ran a quick evaluation of the citation extraction, and it seems to me that ParsCit will not extract citations if the word "References" is not present in the text. Unfortunately, I have several documents where this word is not present, but that have citations. Any chance to "relax" this, and identify citations even when "References" is not present in the text? Thanks, Renaud