ropensci / jstor

Import journal data from DfR (JSTOR)
https://docs.ropensci.org/jstor
47 stars 9 forks source link

find_references lumps authors together #26

Closed tklebel closed 6 years ago

tklebel commented 6 years ago

The new format does not fare well with the current implementation.

<ref id="ref6">
            <mixed-citation publication-type="book">
               <person-group person-group-type="author">
                  <string-name>Aulette, J.</string-name>
               </person-group>and<person-group person-group-type="author">
                  <string-name>Michalowski, R.</string-name>
               </person-group>(<year>1993</year>)<article-title>“Fire in Hamlet: A Case Study of a State-Corporate Crime”</article-title>, in<person-group person-group-type="editor">
                  <string-name>K. Tunnell</string-name>
               </person-group>, ed.,<source>
                  <italic>Political Crime in Contemporary America: A Critical Approach</italic>
               </source>.:<publisher-name>Garland Publishing</publisher-name>, pp.<fpage>171</fpage>–<lpage>206</lpage>.</mixed-citation>
</ref>

gets turned into:

Aulette, J.andMichalowski, R.(1993)“Fire in Hamlet: A Case Study of a State-Corporate Crime”, inK. Tunnell, ed.,Political Crime in Contemporary America: A Critical Approach.:Garland Publishing, pp.171–206.

Either we could explicitly parse the separate fields, or we still lump everything together, but somehow put spaces in between. The general problem is here, that the format is most likely not uniform for all articles from DfR.

Maybe we can distinguish the two formats from the following?

<ref-list content-type="parsed-citations">
tklebel commented 6 years ago

Currently, parsed citations simply raise a warning. Need to think about, how the API for parsed citations should look like.