Closed momeara closed 9 years ago
Hi @momeara , thanks for the tip about this mode, which is new to me.
Looks like this should be do-able in rentrez. I've made a start on branch referenced in the above commit. As you can see, the XML returned is different when by_id
is set:
rec_old <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"))
rec_old$file
<eLinkResult>
<LinkSet>
<DbFrom>gene</DbFrom>
<IdList>
<Id>93100</Id>
<Id>223646</Id>
</IdList>
<LinkSetDb>
<DbTo>protein</DbTo>
<LinkName>gene_protein</LinkName>
<Link>
<Id>768043930</Id>
</Link>
<Link>
<Id>767953815</Id>
</Link>
.
.
.
rec_new <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)
rec_new$file
<eLinkResult>
<LinkSet>
<DbFrom>gene</DbFrom>
<IdList>
<Id>93100</Id>
</IdList>
<LinkSetDb>
<DbTo>protein</DbTo>
<LinkName>gene_protein</LinkName>
<Link>
<Id>768043930</Id>
</Link>
<Link>
<Id>767953815</Id>
</Link>
.
.
.
It will take a little while to write a new parser for this form of the XML (or modify the old one). But this should be easy enough to include in the next release :smile:
Thanks for taking a look :+1:
Hi @momeara , just checked in support for this. Here's the example I'm using in the vignette
all_links_sep <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)
all_links_sep
List of 2 elink objects,each containing
$links: IDs for linked records from NCBI
lapply(all_links_sep, function(x) x$links$gene_protein)
[[1]]
[1] "768043930" "767953815" "558472750" "194394158" "166221824" "154936864"
[7] "119602646" "119602645" "119602644" "119602643" "119602642" "37787309"
[13] "37787307" "37787305" "33991172" "21619615" "10834676"
[[2]]
[1] "148697547" "148697546" "81899807" "74215266" "74186774" "37787317"
[7] "37589273" "31982089" "26339824" "26329351"
So, basically as you suggested, a list of elink
object (with a special prin function so you don't get a screen-full of them if you send a lot of IDs).
Thanks again for pointing this mode behaviour out to me, and hope this helps
This looks fantastic. Thanks for the rapid response!
On Sun, Jul 19, 2015 at 9:11 PM, David Winter notifications@github.com wrote:
Hi @momeara https://github.com/momeara , just checked in support for this. Here's the example I'm using in the vignette
all_links_sep <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)all_links_sep
List of 2 elink objects,each containing $links: IDs for linked records from NCBI
lapply(all_links_sep, function(x) x$links$gene_protein)
[[1]] [1] "768043930" "767953815" "558472750" "194394158" "166221824" "154936864" [7] "119602646" "119602645" "119602644" "119602643" "119602642" "37787309" [13] "37787307" "37787305" "33991172" "21619615" "10834676"
[[2]] [1] "148697547" "148697546" "81899807" "74215266" "74186774" "37787317" [7] "37589273" "31982089" "26339824" "26329351"
So, basically as you suggested, a list of elink object (with a special prin function so you don't get a screen-full of them if you send a lot of IDs).
Thanks again for pointing this mode behaviour out to me, and hope this helps
— Reply to this email directly or view it on GitHub https://github.com/ropensci/rentrez/issues/51#issuecomment-122724089.
Thanks for the nice package--
I would like to look up all protein_ids for each gene_id in a set. However, when I do
I get
and I'm not sure which protein_id corresponds with which input gene_id.
Looking at the E-utils documentation, appears that supplying multiple identifiers in the
id
field in the url groups all the returned links together into a single batch. To get separate links in what they call "by Id" mode, separateid
fields can be supplied in the url. (http://www.ncbi.nlm.nih.gov/books/NBK25500/#_chapter1_Finding_Related_Data_Through_En_). As far as I can tell, the WebDev interface has similar restrictions/capabilities.Would it be possible to support this "by Id" mode with the
entrez_link
function? The interface could perhaps be an additional input argumentby_id
and the resulting output would be a list ofelink
lists, one for each input identifier.