Closed bhavaygg closed 3 years ago
Hi @Chokerino thanks for the heads up, I will take a look at this today.
Hi, can you please provide an example? When I run certain types of queries, I get more than 100 results:
found_pdbs = Query('6239', 'TreeEntityQuery').search() #TaxID for C elegans
print(len(found_pdbs)) # Returns 447
Then it probably depends on the type of query
found_pdbs = Query('Protease bound with agonist').search() #length is 100 where online search returns about 160k results
Thank you! It looks like the issue is the structure of the API; I tried out different parameter combinations here, and it looks like the issue is that "Protease bound with agonist" is being incorrectly treated as a sequence of keywords, rather than a general search query.
This is probably fixable, but it requires determining if RCSB has exposed a method for doing a standard text search, like the default on their website. I'll see what I can find.
So this is still open, huh ?
When I just search for the word "kinase" on https://www.rcsb.org/ I get 30991 hits.
But when I do
> import pypdb
>
> found_pdbs = pypdb.Query('kinase').search()
> print(len(found_pdbs))
I get 11622.
Here's my workaround with their newer API. The Legacy API you're using is not maintained and is going to be taken down in December.
import json
import urllib
from urllib.request import urlopen
url = 'https://search.rcsb.org/rcsbsearch/v1/query'
json_query_string = '''
{
"query": {
"type": "terminal",
"service": "text",
"parameters": {
"value": "kinase"
}
},
"return_type": "entry"
}
'''
def basic_search(req_url,json_str,print_query=True,read_and_load=True):
req_url = url+'?json={request}'
query = urllib.parse.quote(json_str)
url_query = req_url.format(request=query)
if print_query:
print(url_query)
response = urlopen(url_query)
if read_and_load:
return json.loads(response.read())
else:
return response
basic_search_results = basic_search(url,json_query_string)
print(basic_search_results['total_count'])
https://search.rcsb.org/rcsbsearch/v1/query?json=%0A%7B%0A%20%20%22query%22%3A%20%7B%0A%20%20%20%20%22type%22%3A%20%22terminal%22%2C%0A%20%20%20%20%22service%22%3A%20%22text%22%2C%0A%20%20%20%20%22parameters%22%3A%20%7B%0A%20%20%20%20%20%20%22value%22%3A%20%22kinase%22%0A%20%20%20%20%7D%0A%20%20%7D%2C%0A%20%20%22return_type%22%3A%20%22entry%22%0A%7D%0A
30991
So this is still open, huh ? When I just search for the word "kinase" on https://www.rcsb.org/ I get 30991 hits. But when I do
> import pypdb > > found_pdbs = pypdb.Query('kinase').search() > print(len(found_pdbs))
I get 11622.
Thanks very much @bonetwo2 for the comment and code. I was unable to find a fix with the old API, but I agree that I will need to do a major refactor soon anyway
@Chokerino the latest GitHub version should resolve this problem: the results should exactly match those returned by the online interface. It may be some time before we update the pypi and conda versions to match the development version.
Please feel free to re-open this issue if you run into problems. Thank you.
Hello,
Searching only returns 100 results. Is it possible to get all the results and download them?