provenant-dev / origin-community

Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Intermittent error in the org-data reset script #8

Closed nkongsuwan closed 3 days ago

nkongsuwan commented 3 weeks ago

Describe the bug Sometimes, the Reset script in the Developer Setting for resetting org data, associated with a particular domain name, results in an error message Error in reset script: undefined.

To Reproduce I don't know how to reproduce this error. It seems to occur with an organization that has gone through vLEI issuance workflow.

Expected behavior The reset script should work reliably.

Screenshots

Screenshot 2567-09-15 at 14 43 46 Screenshot 2567-09-15 at 14 53 43 Screenshot 2567-09-15 at 14 54 46
dhh1128 commented 2 weeks ago

@nkongsuwan : I believe this error is caused by a script running longer than what the load balancer at the edge of the k8s cluster allows. In other words, a time out is occurring. Sometimes the script runs fast enough that no time out is triggered, which explains why it is inconsistent.

If this is the case, then the error is spurious. That is, a time out really occurred, but it did not prevent the script from succeeding. You can check whether the script succeeded by going to the Identifiers tab and looking for the identifiers with aliases having the correct properties.

I'm tagging @Arsh-Sandhu , who knows more.

nkongsuwan commented 2 weeks ago

@dhh1128 I don't think it is due to timeout. "Intermittent" is probably a bad word choice. The reset errors occur for some org accounts consistently, and repeatedly clicking the reset button repeatedly will not resolve it. My guess is that these org accounts meet some edge conditions that cause the error.

dhh1128 commented 2 weeks ago

[from twin in jira] This issue is being tracked in jira at https://eipi.atlassian.net/browse/DF-2416. To send new comments there, start them with 'Tell jira:'. The issue is assigned in jira to Cal Warshaw. As of 2024-09-16T08:23Z, the status of the issue in jira is 'to do'.

AVKurbatov commented 2 weeks ago

@nkongsuwan , I'd say there could be two reasons for this issue. It is either a timeout, or there is some data left in the database that prevents the remaining data from being deleted. If it's just a timeout, the data will still be deleted, and you can check this in the database after you got ane error. In any case, can you provide the dev-tools service logs?

nkongsuwan commented 2 weeks ago

@AVKurbatov I don’t think it is the timeout issue, but I could be wrong. Unfortunately, I am traveling and won’t have a laptop with me until next week.

This is one of the account that has the reset issue in https://dev.asia.origincloud.net/

lar1@enauthn.id 8Mvd5LBOYprGsp9cExpDX

dhh1128 commented 1 week ago

[from twin in jira] As of 2024-09-23T06:20Z, the status of the issue in jira is 'doing'.

dhh1128 commented 1 week ago

[from twin in jira] As of 2024-09-24T07:09Z, the status of the issue in jira is 'in dev'. At 2024-09-24T06:39Z, Daniel Hardman said: we have reproduced the error, diagnosed it, and fixed it in the developer’s environment. It is now waiting to be merged to the dev environment. At 2024-09-24T07:09Z, Aleksandr Kurbatov said: we have deployed the fix to the asia environment. I tested it with new user, did not remove [+lar1@enauthn.id+|mailto:lar1@enauthn.id] user.

dhh1128 commented 1 week ago

[from twin in jira] The issue is assigned in jira to Aleksandr Kurbatov.

dhh1128 commented 3 days ago

[from twin in jira] As of 2024-10-03T05:05Z, the status of the issue in jira is 'done'. Code with the fix has been deployed in production.