Closed lyttonhao closed 9 years ago
hi @lyttonhao I'll take a look later today...
Thanks. @sckott
@lyttonhao I used your fix in your fork for parsing more than 1, and fixed so that works with > 1 name passed in.
Can you share the example that was failing for you?
Okay. I will test the new code soon. Thanks, @sckott.
Hi @sckott, I think there is still a problem as I faced before. When I test 300 names it works well, but it failed when querying 500 or more names. It seems that the parameters should not be too long.
See the documentation for the API http://resolver.globalnames.org/api they allow GET
and POST
requests. They don't say i think in those docs, but I found https://github.com/ropensci/taxize/blob/master/R/gnr_resolve.R#L23-L25 that 300 does work okay with GET
, but after that POST
is better
should be a simple thing to add in POST
if you are interested
Hi @sckott, I've added some code to work with POST according to https://github.com/ropensci/taxize/blob/master/R/gnr_resolve.R#L86-L97. Below is my corresponding codes:
elif http == 'post':
with open('__gnr_names.txt', 'wb') as f:
for name in names:
f.write("1|%s\n"%name)
payload = {'data_source_ids': source, 'format': format,
'resolve_once': resolve_once, 'with_context': with_context,
'best_match_only': best_match_only, 'header_only': header_only,
'preferred_data_sources': preferred_data_sources}
out = requests.post(url, params = payload, files = {'file': open('__gnr_names.txt', 'rb')} )
out.raise_for_status()
result_json = out.json()
newurl = result_json['url']
while result_json['status'] == 'working':
# print result_json['message']
out = requests.get(url=newurl)
result_json = out.json()
However, it seems that when _while resultjson['status'] == 'working':, it would be an infinite loop. Can you give some advices? Thank your very much.
@lyttonhao I'll have a look soon, trying to get testing and CI set up first, so we can have checks on all change/PR's, etc.
@lyttonhao That while
loop is used because when you use a POST
request you get back a URL for a job that is processing, for which you need to send a new GET
request to retrieve the data. So the while
loop checks pinging the server until it retrieves the data itself, not just a message saying that it is still working. Does that make sense? Send a PR when you think you got is solved, or even if you don't, then I can take a look and see if I can help.
Hi @sckott, I'm very sorry that I've missed your message these days. Do you mean that change the while
constraint? I've change it to while not 'data' in result_json:
, but it seems that it still doesn't work.
Hi, @panks, I try to add "time.sleep(10)" as in your code. I'm afraid it's still in an infinite loop in my computer. Does it work well in your computer when the number of queried name is larger than 1000?
@lyttonhao I'm not sure. I think when GNR API starts operating in a queue it's not working as it's supposed to be. Here is a URL response from a job which I submitted more than 6 hours back, for query size = 1,010 http://resolver.globalnames.org/name_resolvers/5jyg8wkhbvoa.json
It still shows status as 'working'. Maybe they need to fix things on their end. But at least, we got it working for query size > 300 but < 1000, by adding POST.
@sckott Any ideas?
@panks I'm also doubt that maybe end-back codes have some bugs. Since I don't use R before, I haven't test taxize
. @sckott Does it work well on taxize
?
@lyttonhao @panks I'll take a look at this
If there isn't any hope of it working, then one thing we can do is split lists of size > 1000 into smaller chunks and concatenate their results.
@panks @lyttonhao I just played with this now in R, and it seems that when number of names > 1000 the job seems to never finish. I am asking about this now - we should probably not pass more than 1000 names, so jus break up into chunks of < 1000 and pass those.
see GlobalNamesArchitecture/gni#37
Yeah I guess splitting the list is the best way to go as of now. I will do that and send a PR. Thanks!
Since the return line in gnr.py only return the first result, current gnr_resolve don't support to return results of multiple names. I change this line to return all results. It works well when the query containing about 100 names, but gets error of " No JSON object could be decoded" when the number is larger. I haven't fixed it. Anyone can help?