ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

fix bug where retmode is not passed down to make_entrez_query in entr… #121

Closed gmbecker closed 6 years ago

gmbecker commented 6 years ago

The retmode parameter is ignored in entrez_fetch, which causes xml to be returned when a genbank file was asked for. This breaks integration with genbankr. If you could push a narrow bugfix to cran for this soon I would appreciate it, as genbankr is going to continue to fail in the Bioc build system until either this is fixed or I disable the integration (which Id on't want to do, since I think it's a very powerful feature).

I went ahead and bumped the version in the DESCRIPTION file because I need to be able to put a versioned dependency in genbankr, but if you'd prefer those happen separately I can recreate the PR without that.

dwinter commented 6 years ago

Thanks for this @gmbecker, and apologies for the bug. Let me check a few things and I will try to get this up on CRAN today

dwinter commented 6 years ago

Hey @gmbecker , I will improve the way retmode and rettype are documented and used. But for now I think you will just need to specify both arguments (the NCBI is now returning xml for genbank by default). This works for me on the CRAN version

cat(entrez_fetch(db='nuccore', rettype='gb',retmode='text', id=104))
LOCUS       CAA29094                 357 aa            linear   MAM 14-JUL-2016
DEFINITION  beta-subunit, partial [Bos taurus].
ACCESSION   CAA29094
VERSION     CAA29094.1
DBSOURCE    embl accession X05605.1
KEYWORDS    .
SOURCE      Bos taurus (cattle)
...
gmbecker commented 6 years ago

@dwinter. Thanks for looking at this so quickly!

It still fails for me with rettype "gbwithparts" (which is what genbankr uses) because of the missing retmode. In the first try below I just inspect args and let it run, which fails, in the second, I manually insert retmode="text" into the args, which works:

debug(entrez_fetch)

cat(entrez_fetch(db='nuccore', rettype='gbwithparts',retmode='text', id=104))

debugging in: entrez_fetch(db = "nuccore", rettype = "gbwithparts", retmode = "text",

id = 104)

debug: {

identifiers <- id_or_webenv()

if (parsed) {

    if (!is_xml_record(rettype, retmode)) {

        msg <- paste("At present, entrez_fetch can only parse XML

records, got",

            rettype)

        stop(msg)

    }

}

args <- c(list("efetch", db = db, rettype = rettype, config = config,

    ...), identifiers)

records <- do.call(make_entrez_query, args)

if (parsed) {

    return(parse_response(records, rettype))

}

records

}

Browse[2]> n

debug: identifiers <- id_or_webenv()

Browse[2]>

debug: if (parsed) {

if (!is_xml_record(rettype, retmode)) {

    msg <- paste("At present, entrez_fetch can only parse XML records,

got",

        rettype)

    stop(msg)

}

}

Browse[2]>

debug: args <- c(list("efetch", db = db, rettype = rettype, config = config,

...), identifiers)

Browse[2]>

debug: records <- do.call(make_entrez_query, args)

Browse[2]> args

[[1]]

[1] "efetch"

$db

[1] "nuccore"

$rettype

[1] "gbwithparts"

$config

NULL

$id

[1] 104

Browse[2]> n

Error: HTTP failure: 400

<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">

debug(entrez_fetch)

cat(entrez_fetch(db='nuccore', rettype='gbwithparts',retmode='text', id=104))

debugging in: entrez_fetch(db = "nuccore", rettype = "gbwithparts", retmode = "text",

id = 104)

debug: {

identifiers <- id_or_webenv()

if (parsed) {

    if (!is_xml_record(rettype, retmode)) {

        msg <- paste("At present, entrez_fetch can only parse XML

records, got",

            rettype)

        stop(msg)

    }

}

args <- c(list("efetch", db = db, rettype = rettype, config = config,

    ...), identifiers)

records <- do.call(make_entrez_query, args)

if (parsed) {

    return(parse_response(records, rettype))

}

records

}

Browse[2]> n

debug: identifiers <- id_or_webenv()

Browse[2]> n

debug: if (parsed) {

if (!is_xml_record(rettype, retmode)) {

    msg <- paste("At present, entrez_fetch can only parse XML records,

got",

        rettype)

    stop(msg)

}

}

Browse[2]> n

debug: args <- c(list("efetch", db = db, rettype = rettype, config = config,

...), identifiers)

Browse[2]> n

debug: records <- do.call(make_entrez_query, args)

Browse[2]> args

[[1]]

[1] "efetch"

$db

[1] "nuccore"

$rettype

[1] "gbwithparts"

$config

NULL

$id

[1] 104

Browse[2]> args$retmode = "text"

Browse[2]> n

debug: if (parsed) {

return(parse_response(records, rettype))

}

Browse[2]> n

debug: records

Browse[2]> n

exiting from: entrez_fetch(db = "nuccore", rettype = "gbwithparts", retmode = "text",

id = 104)

LOCUS CAA29094 357 aa linear MAM 14-JUL-2016

DEFINITION beta-subunit, partial [Bos taurus].

ACCESSION CAA29094

VERSION CAA29094.1

DBSOURCE embl accession X05605.1

KEYWORDS .

SOURCE Bos taurus (cattle)

> sessionInfo()

~G

On Tue, Feb 20, 2018 at 3:11 PM, David Winter notifications@github.com wrote:

Hey @gmbecker https://github.com/gmbecker , I will improve the way retmode and rettype are documented and used. But for now I think you will just need to specify both arguments (the NCBI is now returning xml for genbank by default). This works for me on the CRAN version

cat(entrez_fetch(db='nuccore', rettype='gb',retmode='text', id=104))

LOCUS CAA29094 357 aa linear MAM 14-JUL-2016 DEFINITION beta-subunit, partial [Bos taurus]. ACCESSION CAA29094 VERSION CAA29094.1 DBSOURCE embl accession X05605.1 KEYWORDS . SOURCE Bos taurus (cattle) ...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/rentrez/pull/121#issuecomment-367153874, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dscPXQR45GPCDIrB6KT6yUUNYevbPks5tW1EZgaJpZM4SMsAK .

-- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research

dwinter commented 6 years ago

Hi @gmbecker,

Seems like a few behaviours have changed recently at NCBI (so even with this fix some tests fail). Making a complete fix for this will need a few tweaks, and I would like to update the docs to make the retmode/rettypes a bit clearer. That probably means a CRAN release is a couple of days away.

In the meantime, I have merged the relevant bits of this PR into the develop branch. If you have users being struck by this you can get them to install the dev branch:

devtools::install_github("ropensci/rentrez", ref = "develop")

Thanks again for pointing this our and finding the solution. Will let you know when a release goes to CRAN.

dwinter commented 6 years ago

Hi @gmbecker , sorry for the delay in getting a fix to this on CRAN.

The develop branch now was a complete fix for this with updates to tests and documentation. Can you confirm the version on the development branch works with genbankr? If so I can get it on CRAN and you'll stop being bugged by the Bioconductor testing platform :)

gmbecker commented 6 years ago

@dwinter Looks looks good to go, vignette and tests now passing for release version of genbankr (they weren't before).

Can you confirm that 1.2.1 is the version number that will be used when submitting to CRAN? if so I'll push a versioned dep into the DESCRIPTION file for release and devel versions of genbankr and we should be good.

dwinter commented 6 years ago

Hi @gmbecker. Yup, 1.2.1 uploaded to CRAN now, will hopefully be available in the next little while.