We have been discussing about the possibility of rethinking the internals of rudof for making it a future-proof Rust-based RDF library. The idea then would be not only to reimplement the model but to formalize the steps followed. Thus, we could design a methodology for implementing RDF in Rust.
By the time now we have been focusing on delivering; that is, obtaining a usable tool. However, the scope has changed, from fast prototyping to a more stable implementation. Thus, a review of the model is required. This idea started with the technical debt that was detected when implementing the rudof_lib module (#200 and #201). As an example, in the case of the SHACL validation, during some benchmarks (#206), we could find that from the total time spent for validating a data graph against a shapes graph, the system spent 16.13% of it cloning and a total of 30.45% of the time compiling the shapes. To put it into perspective, only 44.18% of the time was spent in the validation itself. Refer to Figure 1 for more details.
The components to be changed
The SRDF model
We have detected that the clones come from this part of the codebase. Not only that but, even if the Trait-based design has proved to be a good idea, both the naming conventions and the functionality provided by those traits is a bit confusing. We should stick to the Single-responsability principle. What's more, some of the methods defined depend directly on Oxigraph, losing the inherent genericity of the traits. It is also required to simplify the API of some of the methods (see the helper functions defined in the SHACL validation). Refer to Figures 2 and 3 for examples of the design proposed.
The ShEx and SHACL implementation
The idea of introducing functional parser combinators is well-suited to Rust and aligns closely with the modular nature of both ShEx and SHACL, where many components exhibit similar behavior. However, validation requires more than just syntactic analysis (parsing); it also involves creating a native representation of shapes, performing type validation, resolving imports, and more. This additional layer corresponds to semantic analysis in compiler design.
Strong external dependencies
Right now, sparql_service depends on Oxigraph. Not only that, but sparql_service seems to duplicate functionality (SRDFGraph and SRDFQuery). In shacl_validation, I implemented the store package to simplify the low-level interface of stores, Graph and Endpoint. Additionally, I think Oxigraph introduces a very strong external dependency, and I’m wondering if it would be better to implement this at a lower level (perhaps using DuckDB).
Conclusions
As we have said, we believe that rudof has clearly surpased its initial scope, and has proved to be useful. We have also checked that Rust is a fantastic language for building fast tools on top of RDF. Maybe it would be fine to think about the possibility of refactoring the architecture of the tool.
We have been discussing about the possibility of rethinking the internals of
rudof
for making it a future-proof Rust-based RDF library. The idea then would be not only to reimplement the model but to formalize the steps followed. Thus, we could design a methodology for implementing RDF in Rust.By the time now we have been focusing on delivering; that is, obtaining a usable tool. However, the scope has changed, from fast prototyping to a more stable implementation. Thus, a review of the model is required. This idea started with the technical debt that was detected when implementing the
rudof_lib
module (#200 and #201). As an example, in the case of the SHACL validation, during some benchmarks (#206), we could find that from the total time spent for validating a data graph against a shapes graph, the system spent 16.13% of it cloning and a total of 30.45% of the time compiling the shapes. To put it into perspective, only 44.18% of the time was spent in the validation itself. Refer to Figure 1 for more details.The components to be changed
The SRDF model
We have detected that the clones come from this part of the codebase. Not only that but, even if the Trait-based design has proved to be a good idea, both the naming conventions and the functionality provided by those traits is a bit confusing. We should stick to the Single-responsability principle. What's more, some of the methods defined depend directly on Oxigraph, losing the inherent genericity of the traits. It is also required to simplify the API of some of the methods (see the helper functions defined in the SHACL validation). Refer to Figures 2 and 3 for examples of the design proposed.
The ShEx and SHACL implementation
The idea of introducing functional parser combinators is well-suited to Rust and aligns closely with the modular nature of both ShEx and SHACL, where many components exhibit similar behavior. However, validation requires more than just syntactic analysis (parsing); it also involves creating a native representation of shapes, performing type validation, resolving imports, and more. This additional layer corresponds to semantic analysis in compiler design.
Strong external dependencies
Right now,
sparql_service
depends on Oxigraph. Not only that, butsparql_service
seems to duplicate functionality (SRDFGraph and SRDFQuery). Inshacl_validation
, I implemented the store package to simplify the low-level interface of stores, Graph and Endpoint. Additionally, I think Oxigraph introduces a very strong external dependency, and I’m wondering if it would be better to implement this at a lower level (perhaps using DuckDB).Conclusions
As we have said, we believe that
rudof
has clearly surpased its initial scope, and has proved to be useful. We have also checked that Rust is a fantastic language for building fast tools on top of RDF. Maybe it would be fine to think about the possibility of refactoring the architecture of the tool.