rudof-project / rudof

RDF data shapes implementation in Rust
https://rudof-project.github.io
Apache License 2.0
35 stars 3 forks source link

Refactoring rudof #210

Open angelip2303 opened 16 hours ago

angelip2303 commented 16 hours ago

We have been discussing about the possibility of rethinking the internals of rudof for making it a future-proof Rust-based RDF library. The idea then would be not only to reimplement the model but to formalize the steps followed. Thus, we could design a methodology for implementing RDF in Rust.

By the time now we have been focusing on delivering; that is, obtaining a usable tool. However, the scope has changed, from fast prototyping to a more stable implementation. Thus, a review of the model is required. This idea started with the technical debt that was detected when implementing the rudof_lib module (#200 and #201). As an example, in the case of the SHACL validation, during some benchmarks (#206), we could find that from the total time spent for validating a data graph against a shapes graph, the system spent 16.13% of it cloning and a total of 30.45% of the time compiling the shapes. To put it into perspective, only 44.18% of the time was spent in the validation itself. Refer to Figure 1 for more details.

Figure 1. Flamegraph corresponding to the SHACL validation based on rudof_lib.

The components to be changed

The SRDF model

We have detected that the clones come from this part of the codebase. Not only that but, even if the Trait-based design has proved to be a good idea, both the naming conventions and the functionality provided by those traits is a bit confusing. We should stick to the Single-responsability principle. What's more, some of the methods defined depend directly on Oxigraph, losing the inherent genericity of the traits. It is also required to simplify the API of some of the methods (see the helper functions defined in the SHACL validation). Refer to Figures 2 and 3 for examples of the design proposed.

Figure 2. Proposed architecture which is a simplification of the current one. The idea is that SRDF should be a module that is as generic as possible. Possibly reusable across several other libraries.

 

Figure 3. Proposed design of the SRDF model. The idea would be to have an inner representation of RDF and a set of traits for implementing the top-level features; e.g SHACL validation, ShEx validation...

The ShEx and SHACL implementation

The idea of introducing functional parser combinators is well-suited to Rust and aligns closely with the modular nature of both ShEx and SHACL, where many components exhibit similar behavior. However, validation requires more than just syntactic analysis (parsing); it also involves creating a native representation of shapes, performing type validation, resolving imports, and more. This additional layer corresponds to semantic analysis in compiler design.

Figure 4. Proposed design of the shapes compiler.

Strong external dependencies

Right now, sparql_service depends on Oxigraph. Not only that, but sparql_service seems to duplicate functionality (SRDFGraph and SRDFQuery). In shacl_validation, I implemented the store package to simplify the low-level interface of stores, Graph and Endpoint. Additionally, I think Oxigraph introduces a very strong external dependency, and I’m wondering if it would be better to implement this at a lower level (perhaps using DuckDB).

Conclusions

As we have said, we believe that rudof has clearly surpased its initial scope, and has proved to be useful. We have also checked that Rust is a fantastic language for building fast tools on top of RDF. Maybe it would be fine to think about the possibility of refactoring the architecture of the tool.

angelip2303 commented 16 hours ago

As the initial steps before committing to the new architecture, I think we should focus on: