openwallet-foundation / acapy

ACA-Py is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://aca-py.org
Apache License 2.0
419 stars 512 forks source link

:sparkles: Add TRACE log level and improve logging in core components #3332

Open ff137 opened 2 weeks ago

ff137 commented 2 weeks ago

:construction: WIP

List of changes:

Comparison of start up logs before and after to be included.

Partially resolves #3202

ff137 commented 2 weeks ago

Not sure why the following Scenario Integration Tests are failing:

FAILED examples/anoncreds_issuance_and_revocation::anoncreds_issuance_and_revocation
FAILED examples/presenting_revoked_credential::presenting_revoked_credential
FAILED examples/self_attested::self_attested
ff137 commented 2 weeks ago

Looks like BDD Interop tests also silently fail: ("silently" because the Github action shows a green tick)

Failing scenarios:
  features/0036-issue-credential.feature:9  Issue a credential with the Holder beginning with a proposal
  features/0036-issue-credential.feature:23  Issue a credential with the Holder beginning with a proposal with DID Exchange Connection
  features/0036-issue-credential.feature:53  Issue a credential with the Issuer beginning with an offer
  features/0037-present-proof.feature:19  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.1 
  features/0037-present-proof.feature:20  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.2 
  features/0037-present-proof.feature:37  Present Proof where the prover does not propose a presentation of the proof and is acknowledged with a DID Exchange Connection -- @1.1 
  features/0037-present-proof.feature:38  Present Proof where the prover does not propose a presentation of the proof and is acknowledged with a DID Exchange Connection -- @1.2 
  features/0037-present-proof.feature:55  Present Proof of specific types and proof is acknowledged with a Drivers License credential type -- @1.1 
  features/0037-present-proof.feature:56  Present Proof of specific types and proof is acknowledged with a Drivers License credential type -- @1.2 
  features/0037-present-proof.feature:74  Present Proof of specific types and proof is acknowledged with a Biological Indicators credential type -- @1.1 
  features/0037-present-proof.feature:91  Present Proof of specific types and proof is acknowledged with multiple credential types -- @1.1 
  features/0037-present-proof.feature:109  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.1 
  features/0037-present-proof.feature:110  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.2 
  features/0037-present-proof.feature:129  Present Proof where the prover has proposed the presentation of proof in response to a presentation request and is acknowledged -- @1.1 
  features/0037-present-proof.feature:130  Present Proof where the prover has proposed the presentation of proof in response to a presentation request and is acknowledged -- @1.2 
  features/0037-present-proof.feature:150  Present Proof where the prover has proposed the presentation of proof from a different credential in response to a presentation request and is acknowledged -- @1.1 
  features/0037-present-proof.feature:151  Present Proof where the prover has proposed the presentation of proof from a different credential in response to a presentation request and is acknowledged -- @1.2 
  features/0183-revocation.feature:79  Credential revoked and replaced with a new updated credential, holder proves claims with the updated credential with timesstamp -- @1.1 
  features/0183-revocation.feature:380  Issuer revokes a credential and sends a v1 revocation notification -- @1.1 
  features/0434-out-of-band.feature:9  Issue a v1 indy credential using connectionless out of band invitation
  features/0434-out-of-band.feature:50  Present a v1 indy proof using connectionless out of band invitation

7 features passed, 4 failed, 4 skipped
36 scenarios passed, 21 failed, 103 skipped
348 steps passed, 21 failed, 1014 skipped, 0 undefined
Took 13m2.976s
dbluhm commented 1 week ago

@jamshale heads up on the test failures

Self-attested

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 180, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 95, in main
    alice_pres_ex = await alice.post(
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 475, in post
    return await self.request(
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 338, in request
    body = await self._handle_response(resp, data=data, json=json_)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 303, in _handle_response
    raise ControllerError(f"Request failed: {resp.url} {body}")
acapy_controller.controller.ControllerError: Request failed: http://alice:3001/present-proof/send-request 404: Not Found

Presenting revoked cred

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 198, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 113, in main
    await bob.record(topic="revocation-notification")
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 577, in record
    return await self.event(topic, select=select, event_type=record_type)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 652, in event
    raise ControllerError(
acapy_controller.controller.ControllerError: Event from Bob with topic revocation-notification not received before timeout

AnonCreds issuance and revocation

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 187, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 152, in main
    await bob.record(topic="revocation-notification")
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 577, in record
    return await self.event(topic, select=select, event_type=record_type)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 652, in event
    raise ControllerError(
acapy_controller.controller.ControllerError: Event from Bob with topic revocation-notification not received before timeout
ff137 commented 1 week ago

Yeah, those test failures are a real head-scratcher for me... Doesn't seem to occur in other PRs. And I can't see what in this contribution will cause the problem.

Maybe there's a way to print the ACA-Py agent logs as part of those Test workflows, see if that reveals anything.

Only solution I can think of is to rebase and drop / re-add commits until I can identify what causes the problem 😅

dbluhm commented 1 week ago

Just remembered Jamie is out until Wednesday (I was hoping to just throw this his way lol); I'll take a deeper look at this. It should be possible to get some more logs on failures in the scenario tests, I think.

jamshale commented 1 week ago

I'm not back until Friday but quickly looked at the logs. 2 of them are failing when trying to receive the revocation webhook notification. Perhaps the log level affected the library checking for this event.

One of the issues with using an external library for this is that we may need to get the library changed for changes like this.

For the self attest test, I can't really think of a reason off the top of my head. Running an individual test with higher logging could help find the issue.

I'd say because these tests are passing with main, this must either be an issue with the code or the acapy-controller library.

dbluhm commented 1 week ago

An unfortunate side effect of this change is that pyright/pylance does not detect logger.trace as a valid method. This results in a pretty significant amount of noise in editor, especially in modules with particularly heavy use of trace logs.

dbluhm commented 1 week ago

While testing, I couldn't get the trace logs to print from the plugin registry. I tried setting the log level to "trace" (in the docker-compose for the anoncreds scenario test) and only saw debug and higher. Did I miss something?

ff137 commented 1 week ago

An unfortunate side effect of this change is that pyright/pylance does not detect logger.trace as a valid method. This results in a pretty significant amount of noise in editor, especially in modules with particularly heavy use of trace logs.

Hmm, that sucks...

So, my initial idea was to try swap out the default python logger for loguru, which is much better and offers plenty of enhancements over the python logger. Loguru includes trace levels, and built in solutions for serialised logging and timed-file-logging config, etc, which was manually implemented here.

What that would look like is defining a centralised loguru "get_logger" method somewhere, with all the acapy-config, and importing that, instead of using logging.getLogger.

Some other nice features are colorized logging, and being able to bind context with logs. (e.g. bound_logger = logger.bind(thread_id=thread_id) and then you can implicitly include metadata in all logs when using bound_logger instead.)

I think that's still on the cards, because it really is more elegant to work with. But it does represent a big change to a core component (logging). And it'll take some effort to ensure everything remains identical. The potential bulk refactoring can cause some headaches for other maintainers with merge conflicts. And then there's effort in getting everyone on board with the change.

So, at first I wanted to scrap the loguru idea, and just go for this trace addition... Buut, what do you think? Any potential appetite for switching to loguru? It'll take some more time, but I can take a swing at it.

ff137 commented 1 week ago

While testing, I couldn't get the trace logs to print from the plugin registry. I tried setting the log level to "trace" (in the docker-compose for the anoncreds scenario test) and only saw debug and higher. Did I miss something?

Hmm, it worked on my local with ACAPY_LOG_LEVEL: trace...

Example for loading protocols.endorse_transaction

2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Processing events for plugin: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: definition with package: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Preloading parent package: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction is already loaded
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Adjusted mod_path for relative import: .definition
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.definition
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction.definition is already loaded
2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Loading versioned events for: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Loading events from: acapy_agent.protocols.endorse_transaction.v1_0.routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction.v1_0.routes with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.v1_0.routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Parent module path: acapy_agent.protocols.endorse_transaction.v1_0, Module name: routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction.v1_0 with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction.v1_0 is already loaded
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Adjusted mod_path after loading parent: .routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Finding spec for module: .routes with package: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Importing module: .routes with package: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,255 acapy_agent.core.plugin_registry TRACE Registering events from: acapy_agent.protocols.endorse_transaction.v1_0.routes

You can see that's why I didn't want it at debug level, because it'll clutter the existing, expected logs for debug.

dbluhm commented 1 week ago

So, at first I wanted to scrap the loguru idea, and just go for this trace addition... Buut, what do you think? Any potential appetite for switching to loguru? It'll take some more time, but I can take a swing at it.

I'm not opposed to the idea in principle but definitely agree that it could be a bit of a messy change to integrate with ongoing work streams.

How would this affect plugins? Would plugins also have to use loguru or is there a way to get loguru to integrate with Python's builtin logging?

Are there any performance implications to loguru over builtin logging module?

I think I'm also still a bit on the fence about the idea of a trace log level to begin with. Obviously it's not unheard of to have a level below debug if other loggers like loguru support it. Several of our rust based dependencies like Askar and Indy-VDR also support a trace log level. In the case of those dependencies, though, the inspect-ability of those dependencies is really low because they're compiled libraries. Python, on the other hand, being interpreted, it's pretty trivial to step into a debugger and see exactly what is going on when necessary. I'm a bit skeptical of the value of having the intense kind of detail being emitted to the trace level in these changes.

I think we can definitely up our logging game in ACA-Py but the kind of change I think I'm more interested in is probably more effective/judicious use of the INFO level. 90% of the time when I'm scanning through logs, I'm looking for what messages were sent and received or what Admin API requests came in and their responses. In other words, I usually want to filter through less information but I have to use debug level right now or I don't get what I'm after.

ff137 commented 1 week ago

@dbluhm That makes sense. I think the trace level can be omitted... and instead, just add some sensible debug logs, and start the initiative to move more of the useful logs to info level.

The primary motivator here was to debug an issue with an agent not starting up at all. It was from testing out a fix for the RuntimeWarning from the IndyVdrLedgerPool (#3328), and there were no debug logs or errors to indicate what was causing the agent to hang.

Then I realised there is actually quite a lot of steps in the startup logic with no logging whatsoever. So, I just went at it, adding some sensible logging steps throughout. But, that made debug logging too verbose ... and figured, that is what trace is for.

Fortunately, I've already achieved the goal of isolating what was causing the issue. So, I'd say this particular way of implementing trace logging is optional. And, Loguru should be the preferred way to implement trace functionality.

I do think Loguru is worth it. The performance improvement is probably negligible (until they implement it in C), and for plugins it shouldn't be any issue.

I'm imagining we can define a central, configured logging instance, similar to how we've done here - and then all cases of

import logging
LOGGER = logging.getLogger(__name__)

can just be replace-all'd with

from acapy_agent.config.logging import get_logger
LOGGER = get_logger(__name__)

Then everything else remains the same.

Using standard python logging would still be possible - it just won't benefit from the loguru features. So it would just require letting all the devs know, hey, use get_logger instead.

I think it's one of those "nice-to-haves" that once it's in place, it's a godsend, and makes developing so much easier -- lazy logging, context binding, colorized logging, and much easier configuration. There's just the effort in proving that it's a seamless shift, and that all existing logging configuration can be ported to the new way of doing things. Difficult to forecast how much effort that'll be. But I can try chip away at it, sometime over the next several weeks.

So, over the next while I'll try clean up this PR to just contribute some reasonable addition to the startup logs, and if it proves seamless enough, loguru can come later :-)

sonarcloud[bot] commented 17 hours ago

Quality Gate Passed Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
95.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud