rmzelle / ref-extractor

Reference Extractor - Extract Zotero/Mendeley references from Microsoft Word files
https://rintze.zelle.me/ref-extractor/
MIT License
320 stars 19 forks source link

Extract EndNote metadata? #9

Closed rmzelle closed 5 years ago

rmzelle commented 7 years ago

@adam3smith, @zuphilip, I just came across a Word document with active EndNote fields, and lo and behold, it looks like EndNote embeds item metadata as well, in XML format:

{ ADDIN EN.CITE
<EndNote>
  <Cite>
    <Author>Sikorski</Author>
    <Year>1989</Year>
    <RecNum>1</RecNum>
    <DisplayText>(1)</DisplayText>
    <record>
      <rec-number>1</rec-number>
      <foreign-keys>
        <key app="EN" db-id="eex59zpwv2v0zhezs9qvtfp2vf0sarppa5z0">1</key>
      </foreign-keys>
      <ref-type name="Journal Article">17</ref-type>
      <contributors>
        <authors>
          <author>Sikorski, R. S.</author>
          <author>Hieter, P.</author>
        </authors>
      </contributors>
      <auth-address>Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205.</auth-address>
      <titles>
        <title>A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae</title>
        <secondary-title>Genetics</secondary-title>
        <alt-title>Genetics</alt-title>
      </titles>
      <periodical>
        <full-title>Genetics</full-title>
        <abbr-1>Genetics</abbr-1>
      </periodical>
      <alt-periodical>
        <full-title>Genetics</full-title>
        <abbr-1>Genetics</abbr-1>
      </alt-periodical>
      <pages>19-27</pages>
      <volume>122</volume>
      <number>1</number>
      <edition>1989/05/01</edition>
      <keywords>
        <keyword>Centromere</keyword>
        <keyword>Culture Media</keyword>
        <keyword>DNA, Fungal/*genetics</keyword>
        <keyword>*Genetic Vectors</keyword>
        <keyword>Plasmids</keyword>
        <keyword>Restriction Mapping</keyword>
        <keyword>Saccharomyces cerevisiae/*genetics</keyword>
        <keyword>Transformation, Genetic</keyword>
      </keywords>
      <dates>
        <year>1989</year>
        <pub-dates>
          <date>May</date>
        </pub-dates>
      </dates>
      <isbn>0016-6731 (Print)
0016-6731 (Linking)</isbn>
      <accession-num>2659436</accession-num>
      <work-type>Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.</work-type>
      <urls>
        <related-urls>
          <url>http://www.ncbi.nlm.nih.gov/pubmed/2659436</url>
        </related-urls>
      </urls>
      <custom2>1203683</custom2>
      <language>eng</language>
    </record>
  </Cite>
</EndNote>
}

(I indented the XML; from the supplemental Word file at http://www.sciencedirect.com/science/article/pii/S1096717616301914 [https://doi.org/10.1016/j.ymben.2016.10.017])

Does this look anywhere close to the EndNote XML export format that Zotero can import? Do we have any good documentation of the EndNote XML format?

adam3smith commented 7 years ago

yes, this looks exactly like Endnote XML (from record to record). I hope I can get you the specifications tonight, would have to look for them, but I'd expect Zotero to be able to just import this (I know that doesn't help you directly; it's just FYI).

rmzelle commented 7 years ago

Cool.

It might be really nice for some people too, to write a Zotero plugin that extracts Mendeley and EndNote fields, imports them into a Zotero collection, and replaces the original fields in the document with Zotero fields. Seems reasonably straightforward for .docx files at least, for somebody who knows what they're doing (that would exclude me, though).

zuphilip commented 7 years ago

[... ] somebody who knows what they're doing

There are a lot of examples in coding, where I don't know what I am exactly doing, but in the end it works ;-) 😃

rmzelle commented 7 years ago

There are a lot of examples in coding, where I don't know what I am exactly doing, but in the end it works ;-)

Fair enough. I was mostly just trying to distance myself from developing such a plugin :).

rmzelle commented 7 years ago

this looks exactly like Endnote XML (from record to record). I hope I can get you the specifications tonight

Tonight's the night?

Also, do we know anybody with a copy of EndNote? Would be good to know whether EndNote allows for merging of items, and to generate some examples of in-text citations that cite multiple items (e.g. "(Doe, 2000; Smith, 1999)"), and see how we can detect duplicate cites (maybe check <key app="EN" db-id="...">...</key>?).

bwiernik commented 7 years ago

I have access to Endnote if you need me to check something.

rmzelle commented 6 years ago

@bwiernik, it would be handy if you could provide a few example Word .docx documents, e.g. one with 1 EndNote cite ((Johnson et al. 2000)), one with a multi-cite citation ((Brown et al. 1995; Smith 1990)), cite to an earlier cited item ((Brown et al. 1995)), and bibliography, and, if you have one on hand, a larger document with lots of citations and a bibliography. It would be good to know the version of EndNote you're using too. (does EndNote support LibreOffice? If so, I could use the same type of documents for that)

And if anybody could point me to the EndNote XML specifications, that would help too.

adam3smith commented 6 years ago

Endnote XML specs: https://support.clarivate.com/Endnote/s/article/EndNote-XML-Document-Type-Definition?language=en_US

YuanyuanCSS commented 6 years ago

I am searching all over the internet trying to find answers to this question. Did you guys solve this?

rmzelle commented 6 years ago

I am searching all over the internet trying to find answers to this question. Did you guys solve this?

The extraction of EndNote item metadata from Word documents seems trivial. I'm a little hesitant to support it though, since EndNote has a history of aggression towards developers of software products that read EndNote files. The previous owners, Thomson-Reuters, initiated a lawsuit about a decade ago against George Mason University about Zotero's ability to read EndNote citation styles (the lawsuit made some erroneous claims, and was later dismissed). If you could get the current owners, Clarivate Analytics, to confirm in writing that they'd be okay with me developing the feature, I'd be happy to implement it, but otherwise I'm not going to touch this.

rmzelle commented 5 years ago

I've reached out to Clarivate Analytics and spoke to a product manager. She spoke to their legal department, and she told me the company couldn't rule out taking legal action if I implemented the feature, so I won't risk doing so.

bwiernik commented 5 years ago

Wow.

rmzelle commented 5 years ago

(see also https://www.zotero.org/trac/ticket/686#comment:4 for Dan's take on things :P )

GrauLab commented 2 years ago

I am using Zotero and searched for a way today to collaborate with EndNote users on a manuscript. I found that EndNote technical support provides a solution for the Zotero=>EndNote direction in April 2020 (see https://community.endnote.com/t/convert-document-from-zotero-to-endnote/311438/4). Then I found the threat "couldn't rule out taking legal action" here for the reverse direction. I am not a lawyer, but two comments: What they do for their incoming direction certainly has passed the same legal department as okay. They do not touch any foreign field codes at all, but just use the visible text (citation and bibliography) to match papers and convert citations into their field codes. If the bibliography is contained in the docx (it usually is), this should be self-contained reference information on text level and might be a way forward for the EndNote=>just Word text=>Zotero direction, too?

adam3smith commented 2 years ago

You're either misunderstanding what ref extractor does or the Endnote post. Ref extractor does not help you convert documents at all. It "just" extracts the contained references. (Which isn't covered in the Endnote post because they do have the RIS already). There are some threads over at Zotero on converting Endnote documents, but Zotero does not have the equivalent of Endnote's match citation functionality, so nothing comparable exists. I don't think Zotero is working on anything along those lines, but they would be the ones to ask.

JLC-Sc commented 4 months ago

As someone who is running into the same issue (Endnote -> Zotero in-text citations) this convo is insightful albeit disappointing. Has anyone found a work around/ has there been any changes in recent years?

Trying to sort this out for a PhD thesis and I'd love to avoid manually changing each citation...

Thanks :)