herminiogg / ShExML

A heterogeneous data mapping language based on Shape Expressions
http://shexml.herminiogarcia.com
MIT License
15 stars 2 forks source link

What are the spatial limitations of the Scala ShExML implementation #149

Open andrawaag opened 11 months ago

andrawaag commented 11 months ago

What are the limiation on the source data to be covered by ShExML? I am currently trying to transform a csv having 10.000s rows of data. The file it self is 36Mb in size. When running it does not lead to results. With a smaller version, the ShExML leads to expected results.

herminiogg commented 10 months ago

Hi Andra,

To be honest I have never measured the limitations of ShExML as it would really depend on the format of the input (hierarchical vs tabular) and the used features. Do you receive any error when executing the bigger version or just a blank output?

andrawaag commented 10 months ago

There was no output. The CPU was busy but it kept on going and after 10 hours, I simply terminated the engine

herminiogg commented 10 months ago

Then it is a performance issue and that is something that I am well aware of it but it never happened to me with tabular files only with hierarchical ones. Recently, I transformed an Access database with also 10000 rows and it finished in around 5 mins. If you send me more details I can try to reproduce the error and see if I am able to find a solution.

If this is just purely a performance issue, then you might want to check some recent improvements that I have been making on this regard: https://github.com/herminiogg/ShExML/tree/enhancement-%23148. This branch is not merged with develop nor master yet as I have to finish and polish some things but it passes the tests and the performance is drastically better.