scottjpearson / pubmedSearch

This module takes last_name and first_name fields and downloads all PubMed entries for that name. It puts the PubMed entries into another REDCap field. It also requires one helper text field. It checks PubMed once a day for new entries. Because of inaccuracies of the process, it is recommended that human eyes manually oversee the download process.
MIT License
2 stars 2 forks source link

Cannot create object error #2

Open iznaut opened 5 years ago

iznaut commented 5 years ago

Using test.php to check if the cron job is working (it would be nice if this were documented in the README), it worked fine for a person with 14 matches, but I get a "cannot create object error" when it tries to parse the XML file for a user with 58 matches.

scottjpearson commented 5 years ago

Eric,

Can you provide the URL that should be displayed next to Cannot create object? An initial guess might be special characters in the XML. I’d like to confirm the hunch, though.

Thanks, Scott

From: Eric Neuhaus notifications@github.com Reply-To: scottjpearson/pubmedSearch reply@reply.github.com Date: Wednesday, January 2, 2019 at 1:25 PM To: scottjpearson/pubmedSearch pubmedSearch@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [scottjpearson/pubmedSearch] Cannot create object error (#2)

Using test.php to check if the cron job is working (it would be nice if this were documented in the README), it worked fine for a person with 14 matches, but I get a "cannot create object error" when it tries to parse the XML file for a user with 58 matches.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fscottjpearson%2FpubmedSearch%2Fissues%2F2&data=02%7C01%7Cscott.j.pearson%40vanderbilt.edu%7C544445de94784b5db8d408d670e80f8e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636820539326269514&sdata=KNUaKrJpd0aBG1hbKmvaPoQBAdPvD%2B4YHXJHCGRRROo%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAG6upma_gb6vIhbesuX6m4sQlGECb37eks5u_QeqgaJpZM4ZnGzD&data=02%7C01%7Cscott.j.pearson%40vanderbilt.edu%7C544445de94784b5db8d408d670e80f8e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636820539326269514&sdata=51waSqu9vNAVqyBa1esxgf4FAd2EeLzIq7kdLDhaYrY%3D&reserved=0.

iznaut commented 5 years ago

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=30503783,30487653,30415424,30261413,29792050,28833953,28815738,28759400,28666200,28645075,27329760,27237705,27150464,27035528,26930520,26580857,26580150,26476155,25956751,25845522,25742213,25660732,25640677,25424057,25176622,24746452,24723424,24500179,24338726,24007295,23377128,23062746,22816725,21984801,21923607,21195396,21078709,20680188,19969372,19797434,19561163,19538687,19512981,18378097,17916329,17414236,17360043,17012529,16816777,16631257,16527360,16135631,15984894,15816781,15465584,12555229,11939976,11940800

scottjpearson commented 5 years ago

Thanks. Try now.

It seems that special characters are a common problem for the simple xml parser, so I encoded them in UTF-8. I also made the pull size smaller (200 -> 10) just in case. Please let me know whether one of these works for you.

Thanks, Scott

iznaut commented 5 years ago

Nope, still not working: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=30503783,30487653,30415424,30261413,29792050,28833953,28815738,28759400,28666200,28645075

scottjpearson commented 5 years ago

Darn. Please give me a little while to setup a test download-and-parse with it. Thanks for your patience.

Scott

scottjpearson commented 5 years ago

Eric,

Hmm… It’s running fine on my local PHP (7.1.19). This should produce the XML document. Can you create this file and run this on your machine (php test2.php)?

Thanks, Scott

scottjpearson:pubmedSearch pearsosj$ php -v

PHP 7.1.19 (cli) (built: Aug 17 2018 18:03:17) ( NTS )

Copyright (c) 1997-2018 The PHP Group

Zend Engine v3.1.0, Copyright (c) 1998-2018 Zend Technologies

scottjpearson:pubmedSearch pearsosj$ cat test2.php

<?php

  $url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=30503783,30487653,30415424,30261413,29792050,28833953,28815738,28759400,28666200,28645075";

  $ch = curl_init();

  curl_setopt($ch, CURLOPT_URL, $url);

  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

  curl_setopt($ch, CURLOPT_VERBOSE, 0);

  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

  curl_setopt($ch, CURLOPT_AUTOREFERER, true);

  curl_setopt($ch, CURLOPT_MAXREDIRS, 10);

  curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);

  $output = curl_exec($ch);

  curl_close($ch);

  libxml_use_internal_errors(true);

  $xml = simplexml_load_string($output);

  if (false === $xml) {

        $errors = libxml_get_errors();

        echo 'Errors are '.var_export($errors, true);

        throw new \Exception('invalid XML');

  } else {

        echo $output;

  }
iznaut commented 5 years ago

Yeah, that returns the XML without issue. I'm on PHP version 5.5.38.

scottjpearson commented 5 years ago

I’m running out of tricks. My current best guess is that LibXML is running out of memory on the cron but not on the command line. On the github repo, I changed the pull size to 1 and printed out the number of bytes in each failure. Do you mind testing out this newer version? If this doesn’t shed any light, I might be out of ideas.

Thanks, Scott

iznaut commented 5 years ago

Still no dice:

Error: Cannot create object (1 bytes) from https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=30503783

I tested on PHP 7.1 and got the same error.

scottjpearson commented 5 years ago

I think I figured this out. The Entrez applications use a rate-limiter of 3 queries per second. I slowed the queries down to 1 per second, and it seems to work. I have put the new code on GitHub. I welcome you to test it if others are still interested.

Sorry for the slowness.

Thanks, Scott