it looks like the fetcher will only work for items in PubMedCentral, not all of PubMed

Sogard78 commented 2 years ago

Hi. We Copying a library of pdf articles from SharePoint to Zotero. The “retrieve metadata” function appears to populate the “Info” tab. However, “Tags” (keywords) are missing. Using PMCID Fetcher (per below Zotero Forum suggestion), the CSV export includes very few PMID numbers. it looks like the fetcher will only work for items in PubMedCentral, not all of PubMed. is any way to address this issue?

Zotero Forum suggestion:

Install the PM(C)ID Fetcher add-on for Zotero
Use that to ensure that all items have a PMID (note that the add-on requires a DOI, so items would need that, but most metadata from a PDF would)
Export all items as a CSV and clean-up the extra column so that PMIDs are listed in a single column
Use the add-by-identifier (magic wand) icon to import from those PMIDs

retorquere commented 2 years ago

Do you have a DOI for an article that is in PM but not in PMC?

Sogard78 commented 2 years ago

HI. No. But here is the DOI of an article that is downloaded from PubMed with keywords: 10.1136/bmjresp-2019-000467. It has both PMID and PMCID (PMID: 31673367 PMCID: PMC6797341). And yet the fetcher does not collect either PMID or PMCID and thus no keywords.

retorquere commented 2 years ago

For me it does write the PMID and PMCID in the extra field. Have you turned on auto-collect, or have you manually clicked "Fetch PMCID keys"? Or do you mean something different with "keywords"?

Sogard78 commented 2 years ago

Hi. Summary: Articles downloaded directly from PubMed to Zotero automatically populate keywords in the “Tags” tab. Articles saved from desktop into Zotero (with metadata retrieval) do not populate keywords in the “Tags” tab. Following these instructions using Fetcher add-on, CSV seldom (in this case not at all) includes any PMID/PMCID numbers:

Install the PM(C)ID Fetcher add-on for Zotero
Use that to ensure that all items have a PMID (note that the add-on requires a DOI, so items would need that, but most metadata from a PDF would)
Export all items as a CSV and clean-up the extra column so that PMIDs are listed in a single column
Use the add-by-identifyier (magic wand) icon to import from those PMIDs, about 100-200 at a time (more can get messy)
Then merge all duplicates (you can do this one by one or find a script to do so automatically on the forums here): That will get you the PDF from the original item and the tags from the PubMed item.

I attached an example that we have , please take a look and maybe you can give us an idea to resolve this issue. Exported Items10-31.csv Zotero Test 31Oct2022.docx

retorquere commented 2 years ago

There's too much going on here. If I understand correctly, all you want is that the fetcher fetches tags as well as PM(C)IDs. Once that it done, they may or may not show up in CSV export, but that's out of scope for now.

retorquere commented 2 years ago

Upgrade to 0.0.15 and try again please.

Sogard78 commented 2 years ago

we have 6.0.15 installed, I presume that you meant to say

retorquere commented 2 years ago

No, I mean upgrade the fetcher plugin. When you do, you'll see it's at version 0.0.15.

Sogard78 commented 1 year ago

HI. We just checked and we have for the Zotero client the software version is 6.0.18 and for PMCID fetcher extension, the version is 0.0.15. We still have a few PMCID's results. We are using the manual fetch option, I can't find Auto collect feature that you mentioned in previous messages. I really appreciate if you run a test from your side with these .pdf files attached and let me know if you have any results because for us the Extra field in the .csv file is empty.

Thanks. Aggregate Safety Assessment Planning for the Drug Development.pdf Clinically useful serum biomarkers for diagnosis and prognosis of sarcoidosis.pdf Grunewald_et_al-2019-Nature_Reviews_Disease_Primers.pdf

mazzopalazzo commented 1 year ago

At the risk of hijacking this thread, I have a potentially related issue where the plugin doesn't seem to fetch very many pmic/pcmid entries for imported pdfs that all show DOIs. An example: 10.1097/CCM.0000000000005168. Happy to open another ticket if needs be and to give more details. Latest 0.0.15 and zotero 6.0.18

retorquere commented 1 year ago

@Sogard78 for the "Aggregate Safety" PDF I get DOI 10.1007/s43441-021-00271-2, and if I plug that into the PubMed API I get

{
 "status": "ok",
 "responseDate": "2022-11-10 03:42:52",
 "request": "tool=zotero-pmcid-fetcher;email=email%3Demiliano.heyns%40iris-advies.com;ids=10.1007%2Fs43441-021-00271-2;format=json;idtype=doi;versions=no",
 "records": [
   {
    "doi": "10.1007/s43441-021-00271-2",
    "live": "false",
    "status": "error",
    "errmsg": "invalid article id"
   }
 ]
}

@mazzopalazzo I get the same problem for 10.1097/CCM.0000000000005168.

retorquere commented 1 year ago

Can we please park for the moment what ends up in the CSV and keep to what would be expected to appear in Zotero? The CSV export is a secondary phenomenon -- if the data is not in Zotero, it will not show up in CSV either.

Martin-Laclaustra commented 1 year ago

Dear Mr. Heyns. Thank you for your extensions. The problem here seems to be that you are querying the PMC Database: https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=zotero-pmcid-fetcher&email=email%3Demiliano.heyns%40iris-advies.com&ids=10.1007%2Fs43441-021-00271-2&format=json&idtype=doi&versions=no that contains only articles actually within PMC. You can instead query Pubmed itself: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=10.1007%2Fs43441-021-00271-2&field=doi&retmode=json that returns: {"header":{"type":"esearch","version":"0.3"},"esearchresult":{"count":"1","retmax":"1","retstart":"0","idlist":["33755928"],"translationset":[],"translationstack":[{"term":"10.1007/s43441-021-00271-2[Publisher ID]","field":"Publisher ID","count":"1","explode":"N"},"GROUP"],"querytranslation":"10.1007/s43441-021-00271-2[Publisher ID]"}} where idlist contains the PMID of this article. Subsequently, the complete record of the article could be retrieved to further check for a PMCID (which this article does not have): https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=33755928 (although I think that this can be probably obtained more easily through your initial query.

I hope that you can fix the plugin with this information. (Further documentation describing the API is here: https://www.ncbi.nlm.nih.gov/books/NBK25499/)

Martin-Laclaustra commented 1 year ago

A quick and dirty modification of your code retrieves (only) PMID, but it does it succesfully. At: https://github.com/retorquere/zotero-pmcid-fetcher/blob/51ff9172042deae8ab999eac1ede851f5160dfcb/bootstrap.js#L177 it works replacing lines 177-217:

  // resolve PMID/PMCID based on DOI
  const incomplete = items.filter(item => item.doi && (!item.pmid || !item.pmcid))
  const max = 1
  for (const chunk of Array(Math.ceil(incomplete.length/max)).fill().map((_, i) => incomplete.slice(i*max, (i+1)*max))) {
    const url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?' + Object.entries({
      db: 'pubmed',
      term: chunk.map(item => item.doi).join(','),
      retmode: 'json',
      field: 'doi',
    }).map(([key, value]) => `${key}=${encodeURIComponent(value)}`).join('&')

    try {
      const response = await fetch(url)
      if (!response.ok) throw new Error('Unexpected response from API')
      const data = await response.json()
      if (!data.esearchresult) throw new Error(`no esearchresult: ${JSON.stringify(data)}`)
      if (!data.esearchresult.count) throw new Error(`no esearchresult.count: ${JSON.stringify(data)}`)
      if (data.esearchresult.count !== "1") throw new Error(`esearchresult.count not 1: ${JSON.stringify(data)}`)
      if (!data.esearchresult.idlist) throw new Error(`no esearchresult.idlist: ${JSON.stringify(data)}`)

      for (const item of chunk) {
        item.extra.push(`PMID: ${data.esearchresult.idlist[0]}`)
        item.save = true
      }
    } catch (err) {
      flash('Could not fetch PMCID', `${err.message} Could not fetch PMCID for ${url}: ${err.message}`)
    }
  }

  // fetch tags
  const parser = Components.classes['@mozilla.org/xmlextras/domparser;1'].createInstance(Components.interfaces.nsIDOMParser)
  for (const item of items) {
    //if (!item.pmid && !item.pmcid) continue

Some more coding is needed to provide a more general solution that also gets PMCID in case it exists. But that should be easy extending the fetch tags part.

retorquere commented 1 year ago

Can you upgrade and try again?

Martin-Laclaustra commented 1 year ago

It retrieves both data correctly. I observed that in some occasions, repeating the command duplicates PMID in the "Extra" field. This occurs:

In non-PMC articles. Each time, a new duplicated PMID line appears.
In PMC articles. Only if PMID existed and PMCID did not... PMID is duplicated. Thanks.

retorquere commented 1 year ago

Can you upgrade and try again?

Martin-Laclaustra commented 1 year ago

I repeated the tests and they worked as they should. Thank you for your attention, and again for creating the plugin. I believe that you can close the issue. (Or wait for @Sogard78 to confirm everything is ok for him too)

Sogard78 commented 1 year ago

Hi. Sorry for the late reply here. I will try to test today, and I will let you know the results. Jut to confirm, I need to test with the latest version of Zotero fetcher wich is 0.0.17, Is any think else that I need to change before I try?

Thanks for help.

retorquere commented 1 year ago

Just upgrading will do everything that's needed

Sogard78 commented 1 year ago

Hi. Unfortunately, I am not able to test until next Wednesday. the user is out of the office until next week and I don't have access to those pdf files. I will come with updates ASAP. Thanks.

retorquere / zotero-pmcid-fetcher

it looks like the fetcher will only work for items in PubMedCentral, not all of PubMed #5