turbot / steampipe-mod-oci-insights

View dashboards and reports across all of your Oracle Cloud Infrastructure accounts using Powerpipe and Steampipe.
https://hub.powerpipe.io/mods/turbot/oci_insights
Apache License 2.0
4 stars 2 forks source link

Dashboard fails with error "ERROR: rpc error: code = Unknown desc = Get <URL> net/http: TLS handshake timeout (SQLSTATE HV000)" #82

Closed hslange closed 8 months ago

hslange commented 11 months ago

Describe the bug When running a dashboard e.g. [OCI Compute Instance Dashboard] it fails with error: ERROR: rpc error: code = Unknown desc = Get "https://iaas..oraclecloud.com/20160918/instances?compartmentId=&limit=1000": net/http: TLS handshake timeout (SQLSTATE HV000) This happens with several of the dashboards. My OCI tenant has a large number of instances (100+), file systems etc.

Steampipe version (steampipe -v) v0.21.1

Plugin version (steampipe plugin list) 0.32.1

To reproduce I used a simple query from the examples and this return the same error:

select shape, count(shape) as count from oci_core_instance group by shape;

Error: Get "https://iaas.eu-amsterdam-1.oraclecloud.com/20160918/instances?compartmentId=ocid1.compartment.oc1..aaaaaaaaintvnm2fhegql73n44ecxayb5iw23lm2vxndwfpylgnoln2sgysa&limit=1000": net/http: TLS handshake timeout (SQLSTATE HV000)

Expected behavior Propper results to the query are expected, Dashboards shoing results rather than error messages.

Additional context See screenshot below from one of the failing dashboards: image

rajlearner17 commented 10 months ago

@hslange Thanks for trying out Steampipe!

Sorry to see this issue you are facing. Can you please confirm if this issue of TLS Handshake has been persistent for you till now? We have not been able to reproduce this issue, though we have a very small no of instances in the dev platform.

Are you seeing this for all the dashboards? Or just for the Compute instance dashboard?

This issue is reported here, which may be related to proxy issues or a proxy configuration issue in the environment. Can you please recheck and confirm us?

hslange commented 10 months ago

@rajlearner17 , thank you for your reply. Unfortunately, I have the same issue on various dashboard (but not all). And not just the dashboards, even in query mode I get the same errors when I query some large tables. It also appears both on a Windows as well as a on a Mac client. I have access to some other OCI instances which are smaller. In that case, more dashboard show successfully but not all. I have enabled logging in TRACE mode, but this does not show many more information. If needed I can share the log file. Hope this helps to figure out the issue, because I really like the approach and dashboards.

rajlearner17 commented 10 months ago

@hslange Completely understand your concerns! I am sorry to see you facing this issue.

We are trying to reproduce the same but cannot do it. I would request to send a log file either in slack channel > in #steampipe or here after removing any sensitive info from it.

A few clarifications from your ^ response

Unfortunately, I have the same issue on various dashboard (but not all). >> Can you provide which ones have NO issues and render as expected? and which ones are breaking like compute one.

It also appears both on a Windows as well as a on a Mac client. >> Can you please elaborate more? When you mean Mac client, it means Mac OS. Please provide the configuration details for Windows and Mac to be more specific. E.g. you can send uname -a output for your Mac

I have access to some other OCI instances which are smaller. In that case, more dashboard show successfully but not all. >>> This confused me a bit. Could you tell me if you are running the dashboard from an OCI instance or your local laptop? I don't know if the size of the instance would matter. Looking forward to more info.

I have enabled logging in TRACE mode, but this does not show many more information. If needed I can share the log file. >> This will be helpful as mentioned ^

Thank you

hslange commented 10 months ago

Hi @rajlearner17, A few clarifications from your ^ response

Unfortunately, I have the same issue on various dashboard (but not all). >> Can you provide which ones have NO issues and render as expected? and which ones are breaking like compute one.

These Dashboard work fine:

- OCI Compartment Report
- OCI Tenancy Report
- OCI Identity Group Dashboard  for 90% (fails with "Too Many Requests", but not the Handshake error

It also appears both on a Windows as well as a on a Mac client. >> Can you please elaborate more? When you mean Mac client, it means Mac OS. Please provide the configuration details for Windows and Mac to be more specific. E.g. you can send uname -a output for your Mac

Windows: Windows 10
MacOS : MacOS Catalina, version 10.15.7. Darwin Kernel Version 19.6.0

I have access to some other OCI instances which are smaller. In that case, more dashboard show successfully but not all. >>> This confused me a bit. Could you tell me if you are running the dashboard from an OCI instance or your local laptop? I don't know if the size of the instance would matter. Looking forward to more info.

From my laptop/desktop I have access to multiple different OCI tenants.
By changing the steampipe config file (oci.sp), I can toggle between instances. 
I run the steampipe dashboards all from the same laptop/desktop.
The difference is that the smaller OCI tenant works fine, no issues with the TLS handshake timeout where the large environment shows this problem.

I have enabled logging in TRACE mode, but this does not show many more information. If needed I can share the log file. >> This will be helpful as mentioned ^ This will take some time to clean any sensitive data, but will follow the instructions to post it to the slack channel.

Hope this helps to fix and improve. Herman

ParthaI commented 10 months ago

Hello @hslange, I appreciate your continued engagement, and I'm sorry to hear that you're still encountering issues. During my testing of OCI compliance/insights, I encountered the TooManyRequests error. Your cooperation and feedback are valuable as we work to address this issue.

Error returned by Identity Service. Http Status Code: 429. Error Code: TooManyRequests. Opc request id: d7ee90c120b0f2f633f8eb2a765fc7e4/B3C0CD212D5324E7A5A44A8B6F81F18B/90CD9ED94834E5E7BAABFEFA7DC6FE6D. Message: GET request failed{"schemas":["urn:ietf:params:scim:api:messages:2.0:Error","urn:ietf:params:scim:api:oracle:idcs:extension:messages:Error"],"detail":"Too many SEARCH requests received from Stripe idcs-9dfd8138447d4ee581b5b8308e2aeb17 initiated by client IP '140.204.50.26' on endpoint 'admin/v1/ApiKeys'","status":"429","urn:ietf:params:scim:api:oracle:idcs:extension:messages:Error":{"messageId":"error.common.ratelimiting.stripe.toomanyrequests"}} Operation Name: ListApiKeys Timestamp: 2023-12-18 11:04:56 +0000 GMT Client Version: Oracle-GoSDK/65.28.0 Request Endpoint: GET https://identity.ap-hyderabad-1.oci.oraclecloud.com/20160918/users/ocid1.user.oc1..aaaaaaaa536mivetd3pnhdn24axarkutf7i75pky4f5yir2a762rttowno6q/apiKeys Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_429__429_toomanyrequests for more information about resolving this error. Also see https://docs.oracle.com/iaas/api/#/en/identity/20160918/ApiKey/ListApiKeys for details on this operation's requirements. To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details. If you are unable to resolve this Identity issue, please contact Oracle support and provide them this full error message. (SQLSTATE HV000)

The error indicates that the APIs used in querying the tables are reaching the rate limit. To address this, I recommend leveraging our custom retry configuration block in the ~/.steampipe/config/oci.spc file for connections.

In my connection configuration file(~/.steampipe/config/oci.spc), I configured the max_error_retry_attempts and min_error_retry_delay with the higher values for the connections. The dashboards have been running smoothly without encountering any errors.

Here is a snippet of my connection configuration file (~/.steampipe/config/oci.spc):

connection "oci" {
  plugin = "oci"
  regions   = ["ap-hyderabad-1", "us-ashburn-1", "ap-mumbai-1", "sa-vinhedo-1"]
  max_error_retry_attempts  =  12
  min_error_retry_delay = 35

}

connection "oci_aaa" {
 plugin              = "oci"
  config_file_profile = "OCIAAA"
  config_path         = "~/.oci/config"
  regions             = ["ap-hyderabad-1"]
  max_error_retry_attempts  =  12
  min_error_retry_delay = 35
}

connection "oci_aggregator" {
 plugin      = "oci"
  type        = "aggregator"
  connections = ["oci", "oci_aaa"]
}

Note: By using the custom retry configuration, the errors with codes 429, 500 or 503 will be retried.

Could you please try it by providing a higher value for max_error_retry_attempts and min_error_retry_delay for your connections?

Feel free to customize these settings based on your requirements. Let us know if you have any questions or need further assistance.

Thank You!

hslange commented 10 months ago

Thank you @rajlearner17 . I have played with these settings. They do help a little, but in some dashboards they still remain. I have uploaded the logs to the slack channel, hope this gives some insight on the "TLS handshake timeout" issues in the thread. If I have some other issue, I will open this as a separate issue.

hslange commented 9 months ago

Hi Parthal, I've played around with many options in the config file (max_error_retry_attempts and min_error_retry_delay) but I still get the "net/http: TLS handshake timeout (SQLSTATE HV000)" error. If I add a " limit" to a sql statement, it works better (no error, but also only a subset of the result), until I have a group by or order by (or join) in the sql (understandable, because this needs to first read the entire table). Any additional options I can try (or debug/trace options)? Thanks in advance. Herman

Op ma 18 dec 2023 om 13:45 schreef ParthaI @.***>:

Hello @hslange https://github.com/hslange, I appreciate your continued engagement, and I'm sorry to hear that you're still encountering issues. During my testing of OCI compliance/insights, I encountered the TooManyRequests error. Your cooperation and feedback are valuable as we work to address this issue.

Error returned by Identity Service. Http Status Code: 429. Error Code: TooManyRequests. Opc request id: d7ee90c120b0f2f633f8eb2a765fc7e4/B3C0CD212D5324E7A5A44A8B6F81F18B/90CD9ED94834E5E7BAABFEFA7DC6FE6D. Message: GET request failed{"schemas":["urn:ietf:params:scim:api:messages:2.0:Error","urn:ietf:params:scim:api:oracle:idcs:extension:messages:Error"],"detail":"Too many SEARCH requests received from Stripe idcs-9dfd8138447d4ee581b5b8308e2aeb17 initiated by client IP '140.204.50.26' on endpoint 'admin/v1/ApiKeys'","status":"429","urn:ietf:params:scim:api:oracle:idcs:extension:messages:Error":{"messageId":"error.common.ratelimiting.stripe.toomanyrequests"}} Operation Name: ListApiKeys Timestamp: 2023-12-18 11:04:56 +0000 GMT Client Version: Oracle-GoSDK/65.28.0 Request Endpoint: GET https://identity.ap-hyderabad-1.oci.oraclecloud.com/20160918/users/ocid1.user.oc1..aaaaaaaa536mivetd3pnhdn24axarkutf7i75pky4f5yir2a762rttowno6q/apiKeys Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_429__429_toomanyrequests for more information about resolving this error. Also see https://docs.oracle.com/iaas/api/#/en/identity/20160918/ApiKey/ListApiKeys for details on this operation's requirements. To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details. If you are unable to resolve this Identity issue, please contact Oracle support and provide them this full error message. (SQLSTATE HV000)

The error indicates that the APIs used in querying the tables are reaching the rate limit. To address this, I recommend leveraging our custom retry configuration block https://github.com/turbot/steampipe-plugin-oci/blob/main/docs/index.md#configuration in the ~/.steampipe/config/oci.spc file for connections.

In my connection configuration file(~/.steampipe/config/oci.spc), I configured the max_error_retry_attempts and min_error_retry_delay with the higher values for the connections. The dashboards have been running smoothly without encountering any errors.

Here is a snippet of my connection configuration file (~/.steampipe/config/oci.spc):

connection "oci" { plugin = "oci" regions = ["ap-hyderabad-1", "us-ashburn-1", "ap-mumbai-1", "sa-vinhedo-1"] max_error_retry_attempts = 12 min_error_retry_delay = 35

}

connection "oci_aaa" { plugin = "oci" config_file_profile = "OCIAAA" config_path = "~/.oci/config" regions = ["ap-hyderabad-1"] max_error_retry_attempts = 12 min_error_retry_delay = 35 }

connection "oci_aggregator" { plugin = "oci" type = "aggregator" connections = ["oci", "oci_aaa"] }

Note: By using the custom retry login, the errors with code 429, 500 or 503 will be retried.

Could you please try it by providing a higher value for max_error_retry_attempts and min_error_retry_delay for your connections?

Feel free to customize these settings based on your requirements. Let us know if you have any questions or need further assistance.

Thank You!

— Reply to this email directly, view it on GitHub https://github.com/turbot/steampipe-mod-oci-insights/issues/82#issuecomment-1860405456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK35EPB2NHVS2O4TSNVZXZ3YKA3HNAVCNFSM6AAAAAA7WQX5GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRQGQYDKNBVGY . You are receiving this because you were mentioned.Message ID: @.***>

bigdatasourav commented 8 months ago

Turbot Pipes offers a "datatank" feature that allows for background querying of data, enabling instant access upon demand. This might be worth exploring. More information is available at: https://turbot.com/pipes/blog/2023/10/datatank-launch.

ParthaI commented 8 months ago

We've submitted a support request in the OCI SDK repository to address this issue. The progress of the issue can be tracked here: https://github.com/oracle/oci-go-sdk/issues/476.

bigdatasourav commented 8 months ago

Hey @hslange, closing the issue for now, we will revisit it once we get an update from the above SDK issue.