nkons / r2rml-parser

R2RML Parser is an award-winning tool that can export relational database contents as RDF graphs, based on an R2RML mapping document.
Apache License 2.0
68 stars 21 forks source link

Duplicate element in linkedlist #34

Open ilikecola opened 5 years ago

ilikecola commented 5 years ago

Hello nkons, I have a question about LinkedList in https://github.com/nkons/r2rml-parser/blob/70257775a56f337530cbc508f7fca2d6a75dc620/src/main/java/gr/seab/r2rml/beans/Parser.java#L117-L132 If there are some predicateObjectMappings in one LogicalTableMapping, there will be many duplicate element in linkedlist first or second.

So will these duplicate elements be parsed twice or more in the below loop? https://github.com/nkons/r2rml-parser/blob/70257775a56f337530cbc508f7fca2d6a75dc620/src/main/java/gr/seab/r2rml/beans/Generator.java#L189

Thank you very much!

nkons commented 5 years ago

Hi,

Thanks for your question.

Not sure I understand it though; did you run in and noticed that duplicates were generated? If so, could you please share your input data?

Is the question about what processing takes place in memory? If so, then in the second loop, there is only a getter; no additional processing takes place.

Also to mention that the lists first and second have no duplicates as the former contains predicate object maps for which RefObjectMap is not null, and the latter the ones for which RefObjectMap is null.

Did this answer your question? Happy to dig deeper if needed.

Best, Nikos

ilikecola commented 5 years ago

Hi Nikos,

I am sorry for my bad description. Imagining that we have this r2rml material:

` @prefix rr: http://www.w3.org/ns/r2rml#. @prefix xsd: http://www.w3.org/2001/XMLSchema#. @prefix ex: http://example.com/ns#. ex:TriplesMap_Dept rr:logicalTable [ rr:tableName "DEPT_CN" ]; rr:subjectMap [ rr:template "http://data.example.com/department/{DEPTNO}"; rr:class ex:Department; ]; rr:predicateObjectMap [ rr:predicate ex:deptNum; rr:objectMap [ rr:column "DEPTNO" ; rr:datatype xsd:integer ]; ]; rr:predicateObjectMap [ rr:predicate ex:deptName; rr:objectMap [ rr:column "DNAME" ; rr:language "zh-cn" ]; ]; rr:predicateObjectMap [ rr:predicate ex:deptLocation; rr:objectMap [ rr:column "LOC" ; rr:language "zh-cn" ]; ].

ex:TriplesMap_Emp rr:logicalTable [ rr:tableName "EMP_CN" ]; rr:subjectMap [ rr:template "http://data.example.com/employee/{EMPNO}"; rr:class ex:Employee; ]; rr:predicateObjectMap [ rr:predicate ex:empNum; rr:objectMap [ rr:column "EMPNO" ; rr:datatype xsd:integer ]; ]; rr:predicateObjectMap [ rr:predicate ex:empName; rr:objectMap [ rr:column "ENAME" ; rr:language "zh-cn" ]; ]; rr:predicateObjectMap [ rr:predicate ex:jobType; rr:objectMap [ rr:column "JOB" ; rr:language "zh-cn" ]; ]; rr:predicateObjectMap [ rr:predicate ex:worksForDeptNum; rr:objectMap [ rr:column "DEPTNO" ; rr:dataType xsd:integer ]; ]; rr:predicateObjectMap [ rr:predicate ex:worksForDept; rr:objectMap [ rr:parentTriplesMap ex:TriplesMap_Dept ; rr:joinCondition [ rr:child "DEPTNO"; rr:parent "DEPTNO" ]]]. `

According to the first loop, every rr:predicateObjectMap for which RefObjectMap is null will make its LogicalTableMapping be added into the first linkedlist.

In our example, ex:TriplesMap_Dept will be added into the first linkedlist three times because there are three rr:predicateObjectMap for which RefObjectMap is null. ex:TriplesMap_Emp will be added into the first linkedlist four times and be added into second linkedlist once because there are four rr:predicateObjectMap for which RefObjectMap is null and one rr:predicateObjectMap is not null.

And I confirmed by add a break-point at end of first loop to watch elements in the first linkedlist.

Is that correct?

And I also confirmed that if there is only one Jena RDF result model across all process, duplicates in linkedlist will not cause any more triples but I try to use multiple model to reduce memory usage which will make duplicates in linkedlist parsed twice or more.

If there are any unclear descriptions, please question me. Thank you very much!

Jason