i2group / analyze

Develop and deploy custom Java extensions and REST API client code for i2 Analyze. View the Java API documentation.
https://i2group.github.io/analyze/
MIT License
37 stars 30 forks source link

Using custom entity types when loading data #53

Closed martonnovak closed 4 years ago

martonnovak commented 5 years ago

After trying out the data load direct example I would like to create my own .xsd files and use them to upload custom entities to the database. In the given xsds they used tags and it got translated to Person items in Onyx.

My question would be how can I create my own custom schema, and how can I map it to an already existing entity type in Onyx.

For example I would like to define a new car type and if I'd make a sound and valid xsd file with a couple of car entities, the data load script could map these entities to the vehicle type entities in the system. The schema base / creation and the mapping are the key questions.

boalesth commented 5 years ago

Hi Martin,

Congratulations on getting the example running correctly.

Now firstly given that this information will be helping you load data into the Analysis Repository and not the Information Store, be really sure that this is what you would like to do. It may be worth reaching out to your IBM representative to see if a different approach could also match your needs.

If you decide that you need to continue, you need to map the external representations in your data to an i2 schema, which you can create and modify using schema designer.

A valid xsd file can be created via the toolkit:

  1. On the i2 Analyze server, open a command prompt and navigate to the scripts directory of the i2 Analyze deployment toolkit.

  2. Run the following command to generate a .jar file that contains the XML representation: setup -t generateMappingJar -x _i2analyze_schema_ -o definition_jar Here, _i2analyzeschema is the full name (including the path) of the schema file, and _definitionjar is the full name (including the path) of the output .jar file. Warning If _definitionjar specifies an existing file, the generateMappingJar task does not overwrite it. If you run the same command twice in succession, ensure that you move or delete the output between calls. Inside the .jar file, the name of the XML file that you need is schema4.xsd.

Hope this helps

Esther

martonnovak commented 5 years ago

Thank you for your fast reply!

Although we still couldn't find out how does the underlying methods transform the data to the correct format. On the example, one can see that there are Actors and Actor entities in the data1.xml file. Can you provide or explain us the function/method that transforms these entities to the existing items defined in the mentioned analyze schema? (We are using the law-enforcement schema) So the way that Actor is converted to Person

Thanks in advance! Marton

TonyJon commented 5 years ago

Hi Marton

As you have probably seen, the onyx-da-arload-filesystem-example example project uses common code from another project that you load into your IDE called onyx-da-example-common.

If you follow the “load” method from ExampleDataLoaderMain it calls exampleDataLoader.load.

This load method calls createTransformedXmlSource with a variable that holds the name of the file which contains the data to load. In this case, as you have already seen, it is a simple file called data1.xml. (On purpose, the data in this file is not in the same shape as the schema we are using so that we can show one way of converting it to what we need). This method then calls out to a method in …\SDK\sdk-projects\onyx-da-example-common\src\main\java\com\example\ExampleXmlTransformer.java and returns XML as we want it. (I will describe this below).

In order for you to be able to load data into the AR you have to give us it as XML in a format that we understand.

When you ran the generateMappingJar for your Analysis Repository Schema it created a set of compiled java classes that know how to map from XML that matches a simplified representation of the entities and links in that schema to our internal types. So that you do not have to guess what this XML should look like, we also create the XSD’s needed to define what we are expecting from you and as Esther has mentioned earlier, schema4.xsd holds all that you need from your perspective.

You can decide how you want to do this, we do not mind, all we require is that you give us XML that we understand.

Going back to the specific example you have been running: Our example uses an external data input file which is XML in a different shape than we need, as I have mentioned above. We have to convert this XML into our own XML format so that we can then build the correct internal classes for the schema.

The simplest way for us to change the external XML into our XML format is by using XSLT. (This is just a way to detect strings in input XML, for example ‘Actor’, and write different XML as a result, for example 'Person'.

If you look in …\SDK\sdk-projects\onyx-da-example-common\fragment\WEB-INF\classes\dataToI2analyze.xslt you can see it is written to do exactly that, and will only work for a very specific input and creates a very specific output.

If you are doing a Proof of Concept where you want to ingest data from an XML file that the client gives you, then you could just alter this XSLT file so that it matches the XML they give you and can then be used to convert it.

We have also included an example wrapper class \SDK\sdk-projects\onyx-da-example\common\src\main\java\com\example\ExampleXmlTransformer.java as mentioned above, with a transformSourceSystemXml method that takes two parameters, the incoming XML data file and the XSLT file to use in transforming it. This returns XML in the format we want that matches the XSD specification.

We do not expect that you would make a production system using the classes as they are, but hope that they give you an understanding of the basic mechanism that you can change and build upon.

As you can see, input and output in this example are tied together by the XSLT file so any change you make to the input data or the i2 Analyze schema to entity or link types you wish to map to would require this to be altered.

If you change your Schema for i2Analyze you need to re-generate the mapping jar and that in turn will create new mapping classes and XSD’s.

Using an XSLT mapping mechanism is just one possible way to do this. If you are happier working with java then you can achieve the same thing by using its XML parsing functionality to read from the file and then create annotated Java classes that know how to write out XML in the correct format for the XSDs and our code.

If your input is from a database or from data structures that are very different than you might want to map to in the i2Analyze schema then you also have to decide whether you want to do all of the mapping and processing in one place. It might be better to use an intermediate format that you store your incoming data in that breaks the task down into simpler stages. (This also helps you to manage changes to either side of that intermediate format).

Depending on how much effort you want to put into it, and whether you think that things will change, you can always make this more dynamic so that your code changes as the input or output does, but you have to weigh up the cost of the extra development compared to the benefit it gives in your particular scenario.

Cheers

martonnovak commented 4 years ago

@TonyJon Thanks again for your fast response, unfortunately some other things came up for us.

We now understand how xmls are translated. However when we tried to simply upload a custom xml, it fails. The xml contains only one person (actor), and corresponding security tags. We found in the bin folder that the default tags are:

< SecurityTagIds> < SecurityTagId>UC< /SecurityTagId> < SecurityTagId>OSI< /SecurityTagId> < /SecurityTagIds>

We have tried to add these tags to our custom xml, simply with the tags, but it failed hard. When we checked the data1.xml, we couldn't find any corresponding tags. We are trying to associate different SecurityTagIds to each "row" from our data table, so each person could have different SecurityTagId.

One more thing: if we'd like to associate a person to an organization. How can we modify our xml to make the associations different? In the Intelligence Portal there are several different options as: Member Of, Involved In, etc... In the examples we only saw the simple associations between Actors.

Thanks in advance!

TonyJon commented 4 years ago

Hi Marton

In our example, we decided that we would have the same security tags added for all of the entitles and links. This means that we did not have to parse and translate incoming values from the data file, we just had to add the same static values to each entity or link using our XSLT. This is why when you looked in data1.xml there were no values that you could find.

For your information, if you look in dataToi2analyze.xslt in the same folder as the data file you can see we have this element in our itemTemplate section.

 <SecurityTagIds>
        <SecurityTagId>UC</SecurityTagId>
        <SecurityTagId>OSI</SecurityTagId>
  </SecurityTagIds>

This is simply the format required to add default dimension values to each item we parse ( entity or link) into our XML format.

If you are using the default example-dynamic-security-schema.xml that we provide then you will see that these two values match a security dimension value from each of our two default security dimensions.

Information on how to set up your own dynamic security schema that matches your requirements can be found here: https://www.ibm.com/support/knowledgecenter/SS3J58_9.2.0/com.ibm.i2.eia.go.live.doc/t_changing_sec_schema.html We have a rule that any item added to our system must have at least one security dimension value from each of the defined access security dimensions which is why you see that we add two in our example. Please excuse the fact that we call them SecurityTagIds in our definition, this is the original name we used before we started using dynamic security with i2 Analyze and it has not changed as we did not want to make older implementations incompatible. If you want to give each item (entity or link) that you create the ability to have different security access values then, as you say, you will need to make the values be driven from your incoming custom xml data. You will then need to alter the example XSLT so it no longer adds our automatic values in, but instead, pattern matches with the values you are adding and converts those to the correct format for our XML. As an example, you can see here that I have added a 2 new elements to each Actor (Entity) and Association (Link) in my version of data1.xml for and for some I use OSI and UC as the values and for some I use HI and CON as the values. e.g: For the entities: Harry Windsor 1972-11-01T00:00:00+00:00 2014-01-01T12:00:00+00:00 2014-01-21T12:00:00+00:00 documents/picture1.jpg picture1.jpg OSI UC James Noel 1975-10-05T00:00:00+00:00 2014-01-01T12:00:00+00:00 2014-01-21T12:00:00+00:00 documents/picture2.jpg picture2.jpg HI CON … … And for the links p1 p2 2014-01-01T12:00:00+00:00 2014-01-21T12:00:00+00:00 OSI UC p1 p3 2014-01-01T12:00:00+00:00 2014-01-21T12:00:00+00:00 HI CON … … This is just one of many ways of doing this. In my case I want to be able to have explicit security values for my entities and my links and as I am using the same format in each case to define this I can just alter the item template in the XSLT to look for these values and use them rather than just supplying static values. To do this I added two new template match rules “Dimension1Value” and “Dimension2Value” and changed my item template to call these to translate the incoming Dimension1Value and Dimension2Value to be in our SecurityTagId format. 0 When I now run the example data load I can see that the queryResultXml that the code creates is now including the new security access dimension values from my data: Some entities and links have OSI UC and some have HI CON ### You also ask how to make the associations different. This simply involves you adding XSLT to map your incoming data to different link types in our i2 Analyze Schema. In the example we have external data for what we call associations and we have decided that we would like each association to be represented in our Analysis Repository Schema as Links of type Associate and we use the XSLT to recognise the external names and map the incoming XML data to our format for this specific link type. You can see this by looking at the Associations template match and the Association template match in the XSLT. If you want to map other relationships differently to this then you could create new data elements in your external xml with other names, and create new template matches for these in your XSLT that can create the corresponding Link Types in our XML format that you want to map them to. You can look in the XSD files that we have mentioned earlier to help you format this new XML correctly. Cheers
martonnovak commented 4 years ago

Thank you very much for your answers!