hyperledger / aries-cloudagent-python

Hyperledger Aries Cloud Agent Python (ACA-Py) is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://wiki.hyperledger.org/display/aries
Apache License 2.0
403 stars 510 forks source link

Migrate from `PyLD` to `RDFLib` #2944

Open ff137 opened 3 months ago

ff137 commented 3 months ago

I'd like to propose that our usage of the PyLD dependency be swapped out for the more versatile and better maintained RDFLib.

Let me bullet point some reasons for this suggestion:

Long story short, the impression I get is that PyLD is not maintained or maintainable.

RDFLib, on the other hand, seems to be the gold standard for Resource Description Framework (RDF) processing in python. They offer:

This repo has much more adoption and activity, and so I think it would be better to migrate to this instead. This is just an idea and open for discussion.

PyLD is currently used in quite a few places in ACA-Py, and probably represent quite a refactoring job to replace it. But I think this may be a worthwhile operation, to reduce technical debt and improve the functionality of a core feature.

ff137 commented 3 months ago

Relates to #2941

PatStLouis commented 3 months ago

PyLd is a library from the authors of the json-ld specification, Digital Bazaar. It aims to be compliant with the specification by conformity with the json-ld processor test-suite. We updated v2.0.4 since we uncovered a bug in the library and submitted a PR. Going forward there are other bugs in the library which creates interoperability issues with aca-py at the moment for some specific cases.

I'm not opposed to changing the library but I think more tests are needed to see if its a suitable replacement.

Pyld can use an asynchronous document loader as well

ff137 commented 3 months ago

@PatStLouis thanks! The synchronous requests document loader was the main reason I thought a replacement is necessary. Because, in ACA-Py, there's a JsonLdDocumentDownloader, which uses the requests framework for synchronous downloads, which is then sort of hacked into an async method in the DocumentLoader class. This leads to nest_asyncio been necessary, to support nested event loops, which all feels like a very smelly way of doing things. That's what got me looking for alternatives.

If an asynchronous document loader can be used instead, that may be a sufficient improvement. So I'll keep this in mind. But yes, it may be that RDFLib is ultimately the way to go, given it has more activity and wider support

PatStLouis commented 2 months ago

@ff137 is the async pyld document loader sufficient for your use case?

ff137 commented 2 months ago

Hi @PatStLouis, I'll need some time to investigate. Will have to test and confirm if the existing synchronous logic can be replaced. I still think migrating to RDFLib is worthwhile overall, although it's probably out of scope for the 1.0.0 release. Either way, over the next couple weeks I'll see if I can put something together to test the async doc loader 👌

PatStLouis commented 2 months ago

pyld is core to a lot of the issuance/verification processes, something to keep in mind. Can you point me into the code where pyld is causing issues?

PatStLouis commented 2 months ago

I think trying to change from request to aiohttp for the document loader is a good place to start and aligns with the rest of the code base