inukshuk / anystyle

Fast citation reference parsing
https://anystyle.io
Other
1.04k stars 90 forks source link

Inconsistant output on online and github code #69

Closed dominic-sps closed 7 years ago

dominic-sps commented 7 years ago

My sample script

require 'anystyle/parser'
$ref = "Wang CJ, Cheng JH, Kuo YR, Schaden W, Mittermayr R. Extracorporeal shockwave therapy in diabetic foot ulcers. Int J Surg. 2015; 24(Pt B):207-9."
puts Anystyle.parse($ref, :citeproc)

When I try the same with https://anystyle.io/ I am getting the below output for CiteProc/JSON output

[{
    "author" : [{
            "family" : "Wang",
            "given" : "C.J."
        }, {
            "family" : "Cheng",
            "given" : "J.H."
        }, {
            "family" : "Kuo",
            "given" : "Y.R."
        }, {
            "family" : "Schaden",
            "given" : "W."
        }, {
            "family" : "Mittermayr",
            "given" : "R."
        }
    ],
    "title" : "Extracorporeal shockwave therapy in diabetic foot ulcers",
    "container-title" : "Int J Surg",
    "volume" : "2015",
    "page" : "207–9",
    "issue" : "24",
    "language" : "en",
    "id" : "wang-a",
    "type" : "article-journal"
}]

Am I doing something wrong?

inukshuk commented 7 years ago

I'm not sure what your question is? Are you asking why there is a different output when you run the parser locally with the default model than on anystyle.io? The results will obviously differ, because anysytle.io uses a different model -- not necessarily a better model, because, anyone can upload training data on anystyle.io and there is no quality control there.

For best (and stable results) you ought to use your own model. The default model's data is a good base to which you can add some additional training data, ideally from the data set you want to parse; i.e., same language, citation style etc.

dominic-sps commented 7 years ago

Thank you. You got my question right. Appreciate your swift response.