treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

Error with http call #1805

Closed makoslokos closed 9 months ago

makoslokos commented 1 year ago

Hi,

When i make a call to external API it the retry it loosing bearer token and authorization is not possible anymore.

+get_token:
   http>: ${some_url}/rest/default/V1/integration/admin/token
   method: POST
   content:
     username: ${username}
     password: ${password}
   content_format: json
   headers:
     - Content-Type: "application/json"
     - Cookie: "PHPSESSID=32bfe8c8e79a7d445abaa9f04e32dd44"
   store_content: true

and the error:

2023-04-14 13:16:11.972 +0000 [INFO] (0210@[53:pr_33578612:230461823]+wf_src_l0_+wf_^sub+load_conf_table^sub+td-for-each-1+if^sub+loop^sub+loop-0+magento_import+magento_import_2+api_call) io.digdag.core.agent.OperatorManager: http>: ${url}/rest/all/V1/orders?searchCriteria[filter_groups][0][filters][0][field]=updated_at&searchCriteria[filter_groups][0][filters][0][value]=2023-04-13%2000:30:00&searchCriteria[filter_groups][0][filters][0][condition_type]=gteq&searchCriteria[filter_groups][1][filters][0][field]=updated_at&searchCriteria[filter_groups][1][filters][0][value]=2023-04-13%2001:00:00&searchCriteria[filter_groups][1][filters][0][condition_type]=lt
2023-04-14 13:16:11.995 +0000 [INFO] (0210@[53:pr_src_l0_it_magento_ploom:33578612:230461823]+wf_src_l0_it+wf_src_l0^sub+load_conf_table^sub+td-for-each-1+if^sub+loop^sub+loop-0+mport+mport_2+api_call) io.digdag.standards.operator.HttpOperatorFactory$HttpOperator: Sending HTTP request: GET https://***
2023-04-14 13:16:12.272 +0000 [ERROR] (0210@[53:pr_src_l0:33578612:230461823]+wf_src_l0+wf_src_l0_^sub+load_conf_table^sub+td-for-each-1+if^sub+loop^sub+loop-0+import+import_2+api_call) io.digdag.core.agent.OperatorManager: Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: HTTP 4XX Client Error: GET https://*** - 401 Unauthorized: {"message":"The consumer isn't authorized to access %resources.","parameters":{"resources":"XYZ::actions_view"}}
    at io.digdag.spi.TaskExecutionException.ofNextPollingWithCause(TaskExecutionException.java:85)

Why the call is loosing autherization? Token is still valid afterwards, but somehow is not used anymore..

hiroyuki-sato commented 1 year ago

Hello, @makoslokos

Have you ever tried http tool like curl command? You can debug -l debug option like digdag run -l debug config.yml.

It shows http request like

+disp_current_date:
  http>: http://localhost:8080/
  headers:
    - hoge: hogehoge
2023-04-16 00:16:05 +0900 [DEBUG] (HttpClient@548755293-19) org.eclipse.jetty.client.HttpSender: Request headers HttpRequest[GET / HTTP/1.1]@41ef043f
Accept-Encoding: gzip
User-Agent: Digdag/0.10.4 Jetty/9.3.z-SNAPSHOT
hoge: hogehoge
Host: localhost:8080
makoslokos commented 1 year ago

Yes, i have tried. In general the url is accessible, this is not the case. Is suspect that the problem is that if during WF execution appear some warning (e.g. too many subtasks) the http clause losts somehow credentials to call url and appears the error of 4xx.

hiroyuki-sato commented 1 year ago

If Digdag send the same request normal and error case, (you can check -l debug) Is there any possibility server-side issue? (ie. Close/Expire session due to too many requests or retry.)

sakama commented 1 year ago

Hi,

I noticed you configured method:POST in yaml file but HTTP error is happening with GET method.

Sending HTTP request: GET https://***

I'm not sure if the URL is same with what you configured in yaml file but I guess your access is redirected to another URL. If your application needs complex access control, it might be better to use py> or rb> operators instead of http> operator. https://docs.digdag.io/operators/scripting.html

This is an example coming from my workflow.

MAX_RETRIES = 10 RETRY_INTERVAL_FACTOR = 15 # seconds REQUEST_TIMEOUT = 30 # seconds

@backoff.on_exception(backoff.expo, Exception, max_tries=MAX_RETRIES, factor=RETRY_INTERVAL_FACTOR) def send_http_request(api_token): url = f'https://example.com/path/to/somewhere' query = f'SELECT * FROM example_table' headers = { 'Authorization': f'Bearer {api_token}' } try: r = requests.get(url, timeout=REQUEST_TIMEOUT, headers=headers, params={'query': query}) r.raise_for_status() return r.json() except requests.exceptions.HTTPError as e: print('HTTP Error: ', e) except requests.exceptions.RequestException as e: print('HTTP connection failure: ', e) except Exception as e: print('Unknown error', e)

makoslokos commented 1 year ago

Ok, will try. Thanks