wwelling / fcrepo-camel-toolbox

A collection of ready-to-use messaging applications with fcrepo-camel
Apache License 2.0
0 stars 0 forks source link

Solr indexing default transform not working for ldp:contains #4

Open wwelling opened 6 months ago

wwelling commented 6 months ago

What - description of what you me to do

Add custom camel route processor to replace all instances of string literal ldp:contains with ldp:ccontains. Update org.fcrepo.camel.indexing.solr.SolrRouter "direct:update.solr" camel route to process using new custom processor before sending to direct:send.to.solr route.

Why - explain why this is important

This is important to resolve an inability of the XSLT of RDF+XML with the XSL is not able to select rdf:RDF/rdf:Description/ldp:contains/@rdf:resource.

codeautopilot[bot] commented 6 months ago

Potential solution

The task requires us to intercept the message body within a specific Apache Camel route and replace all occurrences of the string ldp:contains with ldp:ccontains. The reasoning behind this solution is to work around the limitation of the XSLT transformation that cannot handle the ldp:contains string correctly. By creating a custom processor, we can manipulate the message body before it reaches the Solr update route, ensuring that the transformed data is indexed correctly by Solr.

How to implement

  1. Create the Custom Processor: Implement the CustomProcessor class in the org.fcrepo.camel.indexing.solr package. This class will override the process method from the Processor interface to perform the string replacement.
package org.fcrepo.camel.indexing.solr;

import org.apache.camel.Exchange;
import org.apache.camel.Processor;

public class CustomProcessor implements Processor {
    @Override
    public void process(Exchange exchange) throws Exception {
        String body = exchange.getIn().getBody(String.class);
        String modifiedBody = body.replaceAll("ldp:contains", "ldp:ccontains");
        exchange.getIn().setBody(modifiedBody);
    }
}
  1. Modify the SolrRouter.java File: In the SolrRouter.java file, locate the route that starts with from("direct:update.solr") and insert the custom processor before the .to("direct:send.to.solr") step. The modification should be applied to both branches of the choice that handle the presence or absence of a transformation header.
from("direct:update.solr").routeId("FcrepoSolrUpdater")
    .log(LoggingLevel.INFO, logger, "Indexing Solr Object ${header.CamelFcrepoUri}")
    .setHeader(INDEXING_URI).simple("${header.CamelFcrepoUri}")
    .filter().simple("${header.CamelIndexingTransformation} != ${header.CamelIndexingUri}")
    .choice()
        .when(header(INDEXING_TRANSFORMATION).isNotNull())
            .log(LoggingLevel.INFO, logger,
                "Sending RDF for Transform with XSLT from ${header.CamelIndexingTransformation}")
            .toD("xslt:${header.CamelIndexingTransformation}")
            .process(new CustomProcessor()) // Insert custom processor here
            .to("direct:send.to.solr")
        .when(or(header(INDEXING_TRANSFORMATION).isNull(), header(INDEXING_TRANSFORMATION).isEqualTo("")))
            .log(LoggingLevel.INFO, logger,"No Transform supplied")
            .process(new CustomProcessor()) // Insert custom processor here
            .to("direct:send.to.solr")
        .otherwise()
            .log(LoggingLevel.INFO, logger, "Skipping ${header.CamelFcrepoUri}");
  1. Compile and Test: After implementing the CustomProcessor and updating the SolrRouter, compile the project to ensure there are no compilation errors. Test the route to verify that the string replacement is working as intended.

  2. Deploy the Changes: Once the solution is confirmed to work correctly, deploy the updated code to the production environment where the Fedora repository and Solr instance are running.

Click here to create a Pull Request with the proposed solution

Files used for this task:

Changes on fcrepo-indexing-solr/src/main/java/org/fcrepo/camel/indexing/solr/CustomProcessor.java To solve the task for the `CustomProcessor.java` file, we need to create a new Java class that implements the `org.apache.camel.Processor` interface. This class will contain a `process` method that will be responsible for replacing all instances of the string literal `ldp:contains` with `ldp:ccontains` in the message body. Here is a concrete practical proposal for the `CustomProcessor.java` class: ```java package org.fcrepo.camel.indexing.solr; import org.apache.camel.Exchange; import org.apache.camel.Processor; public class CustomProcessor implements Processor { @Override public void process(Exchange exchange) throws Exception { // Retrieve the message body as a String String body = exchange.getIn().getBody(String.class); // Replace all instances of 'ldp:contains' with 'ldp:ccontains' String modifiedBody = body.replaceAll("ldp:contains", "ldp:ccontains"); // Set the modified body back into the message exchange.getIn().setBody(modifiedBody); } } ``` This class does the following: 1. It implements the `Processor` interface from Apache Camel. 2. It overrides the `process` method, which is called with an `Exchange` object that contains the message to be processed. 3. It retrieves the message body as a `String`. 4. It performs the string replacement using the `replaceAll` method. 5. It sets the modified body back into the message. Once this class is created, it will be used in the `SolrRouter.java` file to process messages before they are sent to the Solr update route.
Changes on fcrepo-indexing-solr/src/main/java/org/fcrepo/camel/indexing/solr/SolrRouter.java To address the issue described in the ticket, we need to modify the `SolrRouter.java` file to include a new custom processor that replaces all instances of the string literal `ldp:contains` with `ldp:ccontains`. This processor should be added to the route that sends updates to Solr, specifically before the `.to("direct:send.to.solr")` step. Here's a practical proposal for how to solve this task: 1. **Create the Custom Processor:** Before modifying the `SolrRouter.java` file, ensure that the `CustomProcessor.java` file has been created and implements the `org.apache.camel.Processor` interface with a `process` method that performs the string replacement. This processor will be used in the `SolrRouter` route. 2. **Modify the SolrRouter.java File:** Locate the route that starts with `from("direct:update.solr").routeId("FcrepoSolrUpdater")` and add the custom processor before the `.to("direct:send.to.solr")` step. The modification should look something like this: ```java from("direct:update.solr").routeId("FcrepoSolrUpdater") .log(LoggingLevel.INFO, logger, "Indexing Solr Object ${header.CamelFcrepoUri}") .setHeader(INDEXING_URI).simple("${header.CamelFcrepoUri}") // Don't index the transformation itself .filter().simple("${header.CamelIndexingTransformation} != ${header.CamelIndexingUri}") .choice() .when(header(INDEXING_TRANSFORMATION).isNotNull()) .log(LoggingLevel.INFO, logger, "Sending RDF for Transform with with XSLT from ${header.CamelIndexingTransformation}") .toD("xslt:${header.CamelIndexingTransformation}") // Add the custom processor here .process(new CustomProcessor()) .to("direct:send.to.solr") .when(or(header(INDEXING_TRANSFORMATION).isNull(), header(INDEXING_TRANSFORMATION).isEqualTo(""))) .log(LoggingLevel.INFO, logger,"No Transform supplied") // Add the custom processor here .process(new CustomProcessor()) .to("direct:send.to.solr") .otherwise() .log(LoggingLevel.INFO, logger, "Skipping ${header.CamelFcrepoUri}"); ``` 3. **Compile and Test:** After making the changes, compile the project to ensure there are no errors. Then, test the modified route to confirm that the `ldp:contains` string is being replaced with `ldp:ccontains` as expected. 4. **Deploy the Changes:** Once the changes have been tested and verified, deploy the updated code to the environment where the Fedora repository and Solr instance are running. By following these steps, the `SolrRouter.java` file will be correctly modified to include the new custom processor, which should resolve the issue described in the ticket.