inukshuk / anystyle-cli

AnyStyle Command Line Interface
BSD 2-Clause "Simplified" License
57 stars 8 forks source link

et-al and Collaborator not identified, how can i add additional tags #8

Open iKomettech opened 5 years ago

inukshuk commented 5 years ago

You can add any tags you like by adding them to your training set and creating a new model from it.

Note that et-al is currently parsed as part of other fields (e.g., author) and then interpreted by normalizers that's why we don't need a dedicated tag for it (you get normalized results if you use the json output format for example).

iKomettech commented 5 years ago

i have added collab tag in my dataset but still it finding as title See below example dataset

<sequence>
    <collab>Action to Control Cardiovascular Risk in Diabetes Study Group</collab>
    <author>Gerstein HC, Miller ME, Byington RP, Goff DC Jr, Bigger JT</author>
    <et-al>et al</et-al>
    <title>Effects of intensive glucose lowering in type 2 diabetes</title>
    <container-title>N Engl J Med</container-title>
    <date>2008</date>
    <volume>358</volume>
    <pages>2545-2559</pages>
  </sequence>
inukshuk commented 5 years ago

You'll have to add sufficient samples to your training set.

Do your references really lack all punctuation? That makes for particularly hard to parse references.

iKomettech commented 5 years ago

No, its having punctuations See below is my sample reference Action to Control Cardiovascular Risk in Diabetes Study Group, Gerstein HC, Miller ME, Byington RP, Goff DC Jr, Bigger JT, et al. Effects of intensive glucose lowering in type 2 diabetes. N Engl J Med 2008;358:2545-2559.

Okay thank i will try with punctuation and let you know, thanks

And i have one more question, anystyle is have option to do frontmatter styling ? like tile, bodytext, keywords, authorgroups, abstract and affiliations

inukshuk commented 5 years ago

Note that the XML format is used for training purposes mostly: it it extremely important that you keep all punctuation otherwise the model will not work very well with your input. That is to say, from the XML format you must be able to re-construct the original input.

I'm not sure what you mean by front-matter styling? You can certainly extract those fields from, e.g. the JSON output.