Closed mbarkdull closed 3 years ago
Hi @mbarkdull ,
I'm not sure if there is an upper-limit to the number of sequences you can download in one "fetch" (the round number makes it seem like maybe so). But the best approach for very large requests like this is to do them in batches (also a good idea because very large downloads are often interrupted mid-stream).
In this case, I can get sequences 10,001-10,051 using restart
and retmax
:
past_10k <- entrez_fetch(db = "nuccore",
web_history = mphaIDLinks$web_histories$gene_nuccore,
rettype = "fasta", retstart = 1e4+1, retmax=50)
tf <- tempfile()
cat(past_10k, file=tf)
ape::read.dna(tf, format="fasta")
50 DNA sequences in binary format stored in a list.
Mean sequence length: 5855.74
Shortest sequence: 380
Longest sequence: 25416
Labels:
XM_036286492.1 PREDICTED: Monomorium pharaonis eukaryotic tr...
XM_036286491.1 PREDICTED: Monomorium pharaonis eukaryotic tr...
XM_036286489.1 PREDICTED: Monomorium pharaonis eukaryotic tr...
XM_036286488.1 PREDICTED: Monomorium pharaonis eukaryotic tr...
XM_012683651.3 PREDICTED: Monomorium pharaonis astakine (LOC...
XM_028190656.2 PREDICTED: Monomorium pharaonis astakine (LOC...
...
Base composition:
a c g t
0.303 0.194 0.212 0.291
(Total: 292.79 kb)
The vignette has an example using these in a for loop in the web history section
Looks like there is just a limit on the number of records you can get in one call to fetch
. Closing the issue now, @mbarkdull , let me know if you have any difficulties getting the sequences you are after.
Hi folks,
I'm having an issue with using rentrez to download gene sequences from NCBI. I know that my species has 17,121 sequences of interest, and indeed when I use
entrez_search
, it returns 17,121 sequence IDs. However, when I proceed through the workflow to download the sequences, I wind up with only 10,000 sequences in the output ofentrez_fetch
.My code is as follows:
Any help would be hugely appreciated! -Megan