semantifyit / RocketRML

https://semantify.it
Creative Commons Attribution Share Alike 4.0 International
25 stars 12 forks source link

Error in Joins with two or more conditions #10

Closed dachafra closed 4 years ago

dachafra commented 4 years ago

Hi! We are trying to run the engine with a mapping that has a join between two TriplesMap with two conditions. The output is the following:

Processing with CSV
TypeError: Cannot read property 'startsWith' of undefined
    at CsvParser.getData (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/CSVParser.js:24:18)
    at handleSingleMapping (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/parser.js:329:30)
    at /home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/parser.js:218:9
    at Array.forEach (<anonymous>)
    at doObjectMappings (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/parser.js:211:20)
    at iterateFile (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/parser.js:171:15)
    at Object.parseFile (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/input-parser/parser.js:37:18)
    at /home/dchaves/engines/rocketrml/node_modules/rocketrml/src/index.js:69:32
    at Array.forEach (<anonymous>)
    at process (/home/dchaves/engines/rocketrml/node_modules/rocketrml/src/index.js:46:24)
Saved!

Here is a simple example with two CSV files and the mapping join-twocond.zip

ThibaultGerrier commented 4 years ago

Hi! Thanks for the issue, I did not know RML supported multiple conditions per join. I updated the project to support that. Your mapping should now run without any errors in version 1.6.0.

dachafra commented 4 years ago

We will try it and close it if it works, thanks for the quick fixing

dachafra commented 4 years ago

I don't know if it is related to this issue but now we are obtaining this error (we run it with the maximum stack size of our server and still outputs an error):

Executing data.csv...
Successfully executed the functions in data.csv

Processing with CSV
RangeError: Maximum call stack size exceeded
    at Array.slice (<anonymous>)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:47)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
    at allPossibleCases (/home/rocketrml/node_modules/rocketrml/src/input-parser/helper.js:223:26)
ThibaultGerrier commented 4 years ago

Hi, this shouldn't be related to this issue. However just found and fixed a bug that is probably the cause of your new problem. Please try again with v1.6.1

dachafra commented 4 years ago

thanks!

samiscoding commented 4 years ago

Hi there, We're trying v1.6.1 and we observe that the number of generated triples for the same mapping rules differs considerably compared to the number of generated triples by the previous version. Is there any chance that you're ignoring "duplicates removal" in this version? Or in general, do you consider "duplicates removal"? ( I assumed that it was the default setup of the engine)

ThibaultGerrier commented 4 years ago

By duplicates, do you mean duplicate triples? By converting to rdf triples (option toRDF true) there shouldn't be any duplicates among the triples. This behavior has always been the case. We did switch package versions of jsonld recently (with v1.3.0 from 1.8.1 to 3.0.1 and with v1.0.6 to 3.0.1), the package that allows transformations from json-ld to rdf. I might be that this changed the behavior of converting to triples.

The latest version did only fix bugs with rr:template. I'm not sure how that would decrease the number of triples.

Are now triples missing, or were there too many/incorrect triples before?

I'm afraid I can't do much without some sample mappings & data.

samiscoding commented 4 years ago

Thanks for the explanation. The number of generated rdf triples are increased significantly, that's why I suspected that it may have something to do with the duplicate removal. I got this observation based on our experiments on a massive dataset so unfortunately, I don't have a small sample at the moment to present here. Yet you can consider this report as an experimental observation that may help you further investigate the completeness and correctness of the results of your engine :)

ThibaultGerrier commented 4 years ago

What version were you using before v1.6.1, where the number of triples was lower?

samiscoding commented 4 years ago

If I'm not mistaken it was 1.6.0 (previous version). Also, I have to mention that I tried different mappings and not in all of them this problem occurs, so definitely further investigation is required!