zazuko / rdf-cube-view-query

RDF Cube View Schema query library
2 stars 3 forks source link

Add an option to disable dimension sorting in View.fromCube #95

Closed bprusinowski closed 1 year ago

bprusinowski commented 1 year ago

Hey!

Recently I've been working on performance improvements in Visualize.admin and I've been looking at the performance of the data fetching. I've noticed that in some cases it takes much longer to create the View (View.fromCube) than to fetch the data itself – see the Comments section of this PR.

It turned out that it's "slow" because the dimensions are sorted by name when the View is created. I wanted to ask if it's necessary to sort them or if it's possible to add an option to enable / disable sorting?

I've made some tests and it seems that there was no impact on the generated observations query when adding such parameter (additionally, in Visualize we sort the observations and dimensions anyway in the front-end part). I would be happy to open a PR if this change makes sense :)

Rdataflow commented 1 year ago

@dabo505 can you put this issue on the fast track? we depend on this performance improvement for the next release of visualize...

tpluscode commented 1 year ago

Hi @bprusinowski. Could you share example queries which are actually impacted? With and without the dimensions being sorted. I'm surprised that it should have any effect on the performance of querying the view

bprusinowski commented 1 year ago

Hi @tpluscode, the performance of the query is not affected, but we need to create the View before we fetch the observations – and by default, the dimensions are sorted, which takes around 0.5s in the query used for testing. So this change would not improve the performance of the query, but performance of creating a View :)

tpluscode commented 1 year ago

By all means, do not hesitate to send a PR. Maybe with some additional examples because I do not yet fully understand the impact and where the actual time penalty occurs

bprusinowski commented 1 year ago

Sure, this is the benchmark I've made to pin-point the issue.

With sorting enabled (see that --sort took 549 ms and in total it took 556 ms). Generally it fluctuates around 500-600 ms in the example used for testing (this chart https://int.visualize.admin.ch/en/v/hKkUyBpGpWPl?dataSource=Int).

with sorting

With sorting disabled (see that in total it took 8 ms).

without sorting

Could you grant me the appropriate rights to open a PR in the repository? Thanks!

tpluscode commented 1 year ago

Could you grant me the appropriate rights to open a PR in the repository? Thanks!

Actually, I don't have the necessary permission either. Please fork and PR from there

bprusinowski commented 1 year ago

I've opened a PR (#96) 👍

tpluscode commented 1 year ago

Released in v1.12