openwallet-foundation / acapy

Hyperledger Aries Cloud Agent Python (ACA-Py) is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://wiki.hyperledger.org/display/aries
Apache License 2.0
412 stars 512 forks source link

Regression: plugin defined protocols not working in 0.8.1 #2224

Closed dbluhm closed 1 year ago

dbluhm commented 1 year ago

ca8308fde4 broke the ability to load a protocol in a plugin. Below is the resulting error:

agent_1  | 2023-05-04 20:57:40,368 aries_cloudagent.transport.pack_format DEBUG Expanded message: {'@type': 'https://didcomm.org/data-transfer/0.1/provide-data', '@id': 'a4ce6262-233d-43bb-9c60-ae6b98986218', 'goal_code': 'test_goal', 'data~attach': [{'description': 'test data attachment', 'data': {'json': {'test': 'data'}}}]}
agent_1  | 2023-05-04 20:57:40,368 aiohttp.access INFO 10.89.0.5 [04/May/2023:20:57:40 +0000] "POST / HTTP/1.1" 200 149 "-" "Python/3.7 aiohttp/3.8.1"
agent_1  | 2023-05-04 20:57:40,372 aries_cloudagent.core.conductor ERROR Exception in message handler:
agent_1  | Traceback (most recent call last):
agent_1  |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
agent_1  |     result = coro.send(None)
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 155, in handle_message
agent_1  |     (message, warning) = await self.make_message(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 318, in make_message
agent_1  |     _, warning = await validate_get_response_version(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 49, in validate_get_response_version
agent_1  |     version_definition = await get_version_def_from_msg_class(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 151, in get_version_def_from_msg_class
agent_1  |     definition_path = _get_path_from_msg_class(msg_class)
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 117, in _get_path_from_msg_class
agent_1  |     path = split_str + path.rsplit(split_str, 1)[1]
agent_1  | IndexError: list index out of range

The lines added in aries_cloudagent.core.util make it impossible to load a protocol because the path to the protocol is not guaranteed to include aries_cloudagent as part of it, resulting in the IndexError seen above.

This is quite unfortunate and effectively renders several plugins in the wild incompatible with 0.8.1.

A somewhat minimal reproducible example can be found here: https://github.com/Indicio-tech/aries-acapy-plugin-data-transfer/pull/5 Failing integration test here: https://github.com/Indicio-tech/aries-acapy-plugin-data-transfer/actions/runs/4887340386/jobs/8723798002?pr=5 (see "Print logs on failure" step in job to see logs pasted above)

cc @shaangill025 @swcurran

dbluhm commented 1 year ago

Heads up, @anwalker293 @mepeltier @reflectivedevelopment , impacts several of our plugins.

swcurran commented 1 year ago

Arrggh….

What is the right answer here?

Do we need an 0.8.2 that addresses this issue to continue to work with existing plugins, or is it appropriate/better than we have this as a breaking change that needs to be addressed?

Sorry this was not caught as a breaking change.

dbluhm commented 1 year ago

Every release, I think to myself that we should really have an automated system for running our plugins against the release candidates but just haven't gotten around to it yet :sweat_smile: We did do some testing during the rc phase but I made an absent-minded mistake and tests that I thought were running against 0.8.1-rcX actually were still running against 0.7.X.

If we didn't do a patch release, what would publishing the change look like? A 0.8.2 release is what my mind naturally goes to but I'm not sure what the alternatives would look like

swcurran commented 1 year ago

Happy to do a 0.8.2. We just need to know what the change needs to be. What is the right thing to do for ACA-Py and ACA-Py users?

swcurran commented 1 year ago

FYI — we’re looking at adding redis into AATH, which might help us with some ongoing plugin testing...

usingtechnology commented 1 year ago

Added a comment and "fix" to the example PR that worked for me (running the example docker compose locally).

Bigger issue, and been mentioned previously is how to load plugins. That will definitely affect the search path that ACA-Py is using to locate the plugin code.

In the provided example (and thanks for having something easy for me to replicate), ACA-Py is loaded as a Python library, and the plugin code is the "root", so "acapy" doesn't know where the plugin code is... in this case we set ACAPY_HOME environment variable.

Other plugins will build their images differently, and how/where their plugin code relates to where the ACA-Py code is... the recent mediator service + redis plugin is done a different way than this, and differently than traction...

Do we want to close this (if the example PR builds and tests and merges), or leave this open and investigate some more? I know that Traction has not bumped up to 0.8.1 and they could have issues.

dbluhm commented 1 year ago

Heads up @frostyfrog

dbluhm commented 1 year ago

@usingtechnology re: the fix by adding ACAPY_HOME env var: wouldn't this cause builtin protocols to then fail in a similar way to how plugged in protocols were failing? The plugin package root like acapy_plugin_data_transfer wouldn't be present in the path of builtin protocols the same way aries_cloudagent wasn't present in the path of plugged in protocols.

usingtechnology commented 1 year ago

I honestly do not know. One problem is that there is no "standard" on bundling plugins with ACA-Py; I can point at 3 different ways it is happening. In this case, the plugin is the "parent" project, ACA-Py is one of its requirements. Is that the same for all plugin developers and consumers? For another project, where redis queue plugin is used with the mediator, the plugin is installed to the aries-cloudagent-python:py3.9-0.8.1 and that works without setting ACAPY_HOME (although that also uses a weird path in the --plugin. So many ways to skin a cat. This "fix" only works for this and similarly built images.

This discussion and your issue would leave me to believe that we need to set a standard ASAP - or at least known methods that will work and are supported. And how to publish/bundle plugins for use by 3rd parties

WadeBarnes commented 1 year ago

How does the work here, https://github.com/hyperledger/aries-cloudagent-python/pull/2138, play into this?

usingtechnology commented 1 year ago

I think that’s what we need to carry on with, take @ianco work and start hardening it and putting it into practice. @dbluhm how are you putting images together for PROD? Are you installing them on top of ACA-Py, so they are found in the package path? Or like you are doing here? Maybe for allowing devs to test their plugin, we should have an explicit search path like EXT_PLUGIN_HOME. But production images plugins are libs installed on approved images?

TheTechmage commented 1 year ago

Apologies for taking until today to get this committed and pushed up. I was working on a potential solution yesterday (which can be seen in the commit that got added to the thread) and as far as I can tell, falling back to the module name when the file path doesn't work seems to resolve the issue that @dbluhm was describing. However, I'm not entirely convinced if it's the right fix or just a workaround. At this point in time, I have only tested it with the mentioned data-transfer plugin and the integration tests are working now.

swcurran commented 1 year ago

This seems to be an urgent issue to resolve — correct? @usingtechnology — this might adjust your priorities (although it fits with the “dev experience and docs” item, I guess). @dbluhm — I think Jason (@usingtechnology ) and the Indicio team have the most experience with plugins, have read @ianco’s work on this, and should drive the direction.

Perhaps a meeting early next week to (try to) make decisions? And of course, we’d love @ianco to weigh in, if he is interested. He’s already provided a lot of ideas on this.

usingtechnology commented 1 year ago

Yes, I don’t think we can put this off. I agree that we should meet early next week. Let’s do something more robust and less hacky. Definitely a few different angles to consider, but I think we can come up with something. At quick thought is: we build out plug-in config to specify the path to the plug-in. Nice to have simple defaults but obviously cannot cover all cases. So it’s either get more restrictive on a standard or allow independence.

dbluhm commented 1 year ago

To load a plugin, we provide the python module path to the --plugin argument. To load the plugin, the plugin must be available on the python path, which can be pretty easily modified. I think this system is already pretty robust when it comes to managing where your plugins are located on your system/image. The problem is that the utility method _get_path_from_msg_class is working from a message class within the plugin and trying to work backwards from there so the caller can ultimately locate the definition.py for the protocol.

I think our problem isn't necessarily in locating and loading of plugins but rather more in how we're attempting to handle multiple versions of protocols by working backwards from that message class. I do agree that we need to further refine the plugin interface and maybe this would help with this problem but I think we can address the problem well in advance of refactoring the plugin system.

usingtechnology commented 1 year ago

Sorry, sitting at a ferry terminal so not at my machine to review code. Is this method new? Like the plug-in is loaded ok, but there is a new runtime resolution method? Are message classes always in the same place relative to the plug-in root, or is that up to the developer?

dbluhm commented 1 year ago

Yes, the method is relatively new, having been included in a tag for the first time in 0.8.0-rc0. Relevant commit is https://github.com/hyperledger/aries-cloudagent-python/commit/ca8308fde42964ec379af1bbecfde9220d82ebe8, which was added as part of this pr: https://github.com/hyperledger/aries-cloudagent-python/pull/1940.

Yep, the plugin loads fine but on handling a message in the plugged in protocol, the code added in #1940 triggers the errors in the original description.

As of right now, the message classes are not guaranteed to be in the same place relative to the plugin root. Neither is it required for the plugin to even have a definitiion.py, depending on how it's loaded. This would be helped by Ian's recommendations for refining the plugin system.

usingtechnology commented 1 year ago

Thanks for that. Ok, when I’m back Monday I’ll do a deep dive and get familiar with old and new.

swcurran commented 1 year ago

If we can do a quick fix for that for now, that would be great. If it is that recent, I assume we can come up with something better and get a 0.8.2 out ASAP.

usingtechnology commented 1 year ago

@dbluhm - see PR #2255. This will help with your setup but is still not going to solve it for everyone (Traction). I would like to know your process for testing this out with your plugin. I must be missing something simple because I could not figure out an easy way to get a local version of ACA-Py code into your image for your plugin. I did it, but not ideal...

Didn't have to change anything in your code, didn't have to add ACAPY_HOME, just use ACA-Py as fixed in #2255 and your test passed.

dbluhm commented 1 year ago

@usingtechnology what we typically do to test unreleased versions of ACA-Py: https://github.com/Indicio-tech/aries-acapy-plugin-data-transfer/pull/5/commits/7c869efe1d5c281acb65d5bcee2d773132ad711e

The dockerfile used in the integration tests installs its dependencies from the poetry.lock.

usingtechnology commented 1 year ago

thanks for that. was definitely doing something more difficult.