inukshuk / anystyle

Fast citation reference parsing
https://anystyle.io
Other
1.05k stars 90 forks source link

Remove extraneous characters #73

Open mkbergman opened 7 years ago

mkbergman commented 7 years ago

In the examples below, author and title and year do not remove extraneous double quotes ("), parentheses, or trailing commas. These are easily removed from the output, but I thought I would share this real-world example.

Here are the three examples as submitted to the online service:

Philip Rose, 2013. “Another Guess at the Riddle: More Ado About Nothing,” Analecta Hermeneutica 4 (2013) Philip Rose, 2016. “CS Peirce’s Cosmogonic Philiosophy of Emergent Evolution: Deriving Something from Nothing,” SCIO Revista de Filosofía Journal of Philosophy: 123-142, November 2016 C.S. Peirce, 1878, “The Order of Nature“, Popular Science Monthly, v. 13, pp. 203–217 (June 1878).

This tool is cool and generally works like a charm! Thanks.

Jmuccigr commented 4 years ago

This seems to still (or newly) be a problem. For example, the trailing quotation mark sticks around on the title of this text:

COARELLI F., 1976, "Cinque frammenti di una tomba dipinta dall'Esquilino (Arieti)", in Affreschi romani dalle raccolte dell'Antiquarium comunale, Roma: 22-28

Here's the output:

[
  {
    "author": [
      {
        "family": "COARELLI",
        "given": "F."
      }
    ],
    "date": [
      "1976"
    ],
    "title": [
      "Cinque frammenti di una tomba dipinta dall'Esquilino (Arieti)\""
    ],
    "container-title": [
      "Affreschi romani dalle raccolte dell'Antiquarium comunale"
    ],
    "location": [
      "Roma"
    ],
    "publisher": [
      "22-28"
    ],
    "type": "chapter"
  }
]
inukshuk commented 4 years ago

This is the current punctuation normalizer; I just noticed that we don't have a set of tests for this normalizer yet, so we should compile a list of examples to use for testing before we make changes.

Jmuccigr commented 4 years ago

Happy to help. Where should they go and are there other exx to look at?

inukshuk commented 4 years ago

Help is much appreciated of course!

We need to add tests/specs for the punctuation normalizer; you can look at the brackets normalizer for simple normalizer test. Here we basically need the same thing: a punctuation_spec.rb file in the normalizer specs folder with a number of simple input/output examples of how we want the normalizer to work. The punctuation normalizer currently runs on many different fields, using the title field should be a good option for the specs.

For a more complicated spec file, take a look at the volume normalizer specs.

Jmuccigr commented 3 years ago

Any progress with this? (I have been no help, I will admit.) A trailing quotation mark bites me all the time.

Frontoish commented 3 years ago

I too am finding that trailing quotation marks (single or double) are still left at the end of the title field for Journal articles and Book chapters. I have to edit these out manually once I have got them into Endote, but it would be really good to have this issue fixed. AnyStyle is a wonderful tool - thank you for developing it!

Jmuccigr commented 2 years ago

Still frequently seeing the trailing quotation mark in article titles. Pesky.