pyinat / pyinaturalist

Python client for iNaturalist
https://pyinaturalist.readthedocs.io
MIT License
133 stars 16 forks source link

possible issue with some endpoints #534

Closed svshepherd closed 9 months ago

svshepherd commented 9 months ago

The problem

Doing simple tasks with the get_observation endpoint I am running into trouble. At first the query just returned no response, but for most of today

local_ids = inat.v1.identifications.get_identifications(page=0) local_ids

results in:

MaxRetryError: HTTPSConnectionPool(host='api.inaturalist.org', port=443): Max retries exceeded with url: /v1/identifications?page=0 (Caused by ResponseError('too many 500 error responses'))

Curiously, other contexts ( inat.get_observations(user_id=[uname], d1=start_date, page='all') ) seem to still work.

Environment

willkuhn commented 9 months ago

I noticed the web-based platform was acting wonky yesterday, specifically not always returning lists of observations when there were too many filter parameters. That makes me think the issue you experienced is more an iNat server side issue than anything to do with pyinaturalist.

Hope this helps!

svshepherd commented 9 months ago

That's very helpful. I started to file a bug report on the main site, but realized I don't know the best practice for converting pyinaturalist requests into standard API requests. Is there an straightforward command for troubleshooting? (I realize I can dig a bit and probably figure it out, but if there's an existing tool... well, I'm lazy.) Thanks everybody for your work on this and your help.

JWCook commented 9 months ago

I think @willkuhn is right. Your traceback shows a 500 error, which is on the iNat server's side, and probably a temporary problem. Are you still seeing these errors?

There are instructions for enabling logging here, and that will show the complete API request details.

svshepherd commented 9 months ago

Thank you! Shall I report it in the iNat forum? Here's the traceback with logging:

INFO:pyinaturalist:Request: GET https://api.inaturalist.org/v1/identifications?page=0 User-Agent: python-requests/2.31.0 pyinaturalist/0.19.0 Accept-Encoding: gzip, deflate, br Accept: application/json Connection: keep-alive


MaxRetryError Traceback (most recent call last) File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests\adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 485 try: --> 486 resp = conn.urlopen( 487 method=request.method, 488 url=url, 489 body=request.body, 490 headers=request.headers, 491 redirect=False, 492 assert_same_host=False, 493 preload_content=False, 494 decode_content=False, 495 retries=self.max_retries, 496 timeout=timeout, 497 chunked=chunked, 498 ) 500 except (ProtocolError, OSError) as err:

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\urllib3\connectionpool.py:894, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, response_kw) 893 log.debug("Retry: %s", url) --> 894 return self.urlopen( 895 method, 896 url, 897 body, 898 headers, 899 retries=retries, 900 redirect=redirect, 901 assert_same_host=assert_same_host, 902 timeout=timeout, 903 pool_timeout=pool_timeout, 904 release_conn=release_conn, 905 chunked=chunked, 906 body_pos=body_pos, 907 response_kw 908 ) 910 return response

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\urllib3\connectionpool.py:894, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, response_kw) 893 log.debug("Retry: %s", url) --> 894 return self.urlopen( 895 method, 896 url, 897 body, 898 headers, 899 retries=retries, 900 redirect=redirect, 901 assert_same_host=assert_same_host, 902 timeout=timeout, 903 pool_timeout=pool_timeout, 904 release_conn=release_conn, 905 chunked=chunked, 906 body_pos=body_pos, 907 response_kw 908 ) 910 return response

[... skipping similar frames: HTTPConnectionPool.urlopen at line 894 (2 times)]

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\urllib3\connectionpool.py:894, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, response_kw) 893 log.debug("Retry: %s", url) --> 894 return self.urlopen( 895 method, 896 url, 897 body, 898 headers, 899 retries=retries, 900 redirect=redirect, 901 assert_same_host=assert_same_host, 902 timeout=timeout, 903 pool_timeout=pool_timeout, 904 release_conn=release_conn, 905 chunked=chunked, 906 body_pos=body_pos, 907 response_kw 908 ) 910 return response

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\urllib3\connectionpool.py:884, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 883 try: --> 884 retries = retries.increment(method, url, response=response, _pool=self) 885 except MaxRetryError:

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\urllib3\util\retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='api.inaturalist.org', port=443): Max retries exceeded with url: /v1/identifications?page=0 (Caused by ResponseError('too many 500 error responses'))

During handling of the above exception, another exception occurred:

RetryError Traceback (most recent call last) Cell In[4], line 6 3 logging.getLogger('pyinaturalist').setLevel('INFO') 5 # local_ids = inat.v1.identifications.get_identifications(page=0) ----> 6 local_ids = inat.get_identifications(page=0) 7 local_ids

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\forge_revision.py:328, in Revision.call..inner(*args, kwargs) 324 @functools.wraps(callable) # type: ignore 325 def inner(*args, *kwargs): 326 # pylint: disable=E1102, not-callable 327 mapped = inner.mapper(args, kwargs) --> 328 return callable(*mapped.args, **mapped.kwargs)

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\pyinaturalist\v1\identifications.py:69, in get_identifications(params) 67 identifications = paginate_all(get, f'{API_V1}/identifications', params) 68 else: ---> 69 identifications = get(f'{API_V1}/identifications', **params).json() 71 identifications['results'] = convert_all_timestamps(identifications['results']) 72 return identifications

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\pyinaturalist\session.py:370, in get(url, session, kwargs) 368 """Wrapper around :py:func:requests.get with additional options specific to iNat API requests""" 369 session = session or get_local_session() --> 370 return session.request('GET', url, kwargs)

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\pyinaturalist\session.py:256, in ClientSession.request(self, method, url, headers, json, access_token, allow_redirects, allow_str_ids, dry_run, expire_after, files, ids, only_if_cached, raise_for_status, refresh, stream, timeout, verify, **params) 220 """Wrapper around :py:func:requests.request with additional options specific to iNat API requests 221 222 Args: (...) 242 API response 243 """ 244 request = self.prepare_inat_request( 245 method=method, 246 url=url, (...) 253 params=params, 254 ) --> 256 response = self.send( 257 request, 258 dry_run=dry_run, 259 expire_after=expire_after, 260 only_if_cached=only_if_cached, 261 refresh=refresh, 262 timeout=timeout, 263 allow_redirects=allow_redirects, 264 stream=stream, 265 verify=verify, 266 ) 268 # Raise an exception if the request failed (after retries are exceeded) 269 if raise_for_status:

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\pyinaturalist\session.py:305, in ClientSession.send(self, request, dry_run, expire_after, refresh, retries, timeout, kwargs) 303 # Otherwise, send the request 304 read_timeout = timeout or self.timeout --> 305 response = super().send( 306 request, 307 expire_after=expire_after, 308 refresh=refresh, 309 timeout=(CONNECT_TIMEOUT, read_timeout), 310 kwargs, 311 ) 312 response = self._validate_json( 313 request, 314 response, (...) 318 **kwargs, 319 ) 321 logger.debug(format_response(response))

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests_cache\session.py:205, in CacheMixin.send(self, request, expire_after, only_if_cached, refresh, force_refresh, kwargs) 203 response = self._resend(request, actions, cached_response, kwargs) # type: ignore 204 elif actions.send_request: --> 205 response = self._send_and_cache(request, actions, cached_response, **kwargs) 206 else: 207 response = cached_response # type: ignore # Guaranteed to be non-None by this point

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests_cache\session.py:229, in CacheMixin._send_and_cache(self, request, actions, cached_response, kwargs) 225 """Send a request and cache the response, unless disabled by settings or headers. 226 If applicable, also handle conditional requests. 227 """ 228 request = actions.update_request(request) --> 229 response = super().send(request, kwargs) 230 actions.update_from_response(response) 232 if not actions.skip_write:

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests_ratelimiter\requests_ratelimiter.py:87, in LimiterMixin.send(self, request, kwargs) 77 """Send a request with rate-limiting. 78 79 Raises: 80 :py:exc:.BucketFullException if this request would result in a delay longer than max_delay 81 """ 82 with self.limiter.ratelimit( 83 self._bucket_name(request), 84 delay=True, 85 max_delay=self.max_delay, 86 ): ---> 87 response = super().send(request, kwargs) 88 if response.status_code in self.limit_statuses: 89 self._fill_bucket(request)

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests\sessions.py:703, in Session.send(self, request, kwargs) 700 start = preferred_clock() 702 # Send the request --> 703 r = adapter.send(request, kwargs) 705 # Total elapsed time of the request (approximately) 706 elapsed = preferred_clock() - start

File ~\anaconda3\envs\base2023geonat\Lib\site-packages\requests\adapters.py:510, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 507 raise ConnectTimeout(e, request=request) 509 if isinstance(e.reason, ResponseError): --> 510 raise RetryError(e, request=request) 512 if isinstance(e.reason, _ProxyError): 513 raise ProxyError(e, request=request)

RetryError: HTTPSConnectionPool(host='api.inaturalist.org', port=443): Max retries exceeded with url: /v1/identifications?page=0 (Caused by ResponseError('too many 500 error responses'))

JWCook commented 9 months ago

Oh, I just noticed that you're passing page=0. Page indexes start at 1, so if you remove the page parameter or use page=1, it will work as expected.

Most endpoints will handle page=0 and return the first page, but this one appears to pass its request parameters to a different storage backend (for full text search) that doesn't handle this case:

curl 'https://api.inaturalist.org/v1/identifications?page=0'
{"error":"Elasticsearch error, if this persists please contact the iNaturalist development team.","status":500}

This likely won't be a high priority for them to fix, but it might be worth making a bug report for reference.

On this end, I can add a check to catch page=0 and return a friendlier error message. I also noticed that error retries are making that traceback a lot more verbose than necessary, so I'll try to clean that up a bit.

svshepherd commented 9 months ago

Thanks! Turned out I had two issues: In my original code, I was also entering places as dictionary keys, where the values were the place names. This worked when I first made my notebook, but they now need to be explicitly turned into a list. Appreciate the help!

JWCook commented 9 months ago

Do you know what version of python your notebook was using? The behavior for dict.keys() and other view types may be slightly different based on the python version. Lists or tuples are the best options for params that take multiple values.