calendarQuery is inefficient

sabre-io / dav

sabre/dav is a CalDAV, CardDAV and WebDAV framework for PHP

http://sabre.io

BSD 3-Clause "New" or "Revised" License

1.53k stars 346 forks source link

calendarQuery is inefficient #1056

Open bosim opened 6 years ago

bosim commented 6 years ago

It is quite typical seeing a client do a calendar-query using comp-filter VEVENT and a timerange, requesting only the ETag. Here, we do a quite inefficient job, calling getCalendarObject on all the URIs, and not using the actual calendardata for anything. Any idea on how to optimize? calendarQuery should really return what getCalendarObjects return, then there would not be any issue.

staabm commented 6 years ago

Could you provide a reproducible reduced testcase which shows the problem?

Additonally, could you provide a performance profile (e.g. With blackfire) which shows a calender-query request you are talking about?

bosim commented 6 years ago

Well. I thought this was quite clear.

<calendar-query xmlns="urn:ietf:params:xml:ns:caldav" xmlns:D="DAV:">
  <D:prop>
    <D:getetag/>
  </D:prop>
  <filter>
    <comp-filter name="VCALENDAR">
      <comp-filter name="VEVENT">
    </comp-filter>
  </filter>
</calendar-query>

What will Sabre do? Call calendarQuery in the backend, that will return a list of URIs, for which Sabre will call getCalendarObject on each, just to get the ETag. Do you really need benchmarks for that? For my implementation the calendardata is quite expensive to generate which makes requests taking seconds, where it could be done without fetching the actual calendardata (since we only need ETag) which is cheap in my backend implementation.

staabm commented 6 years ago

Having a benchmark just makes sure we can optimize your real use-case.

Also having a blackfire profile will show room for improvements

staabm commented 6 years ago

Did you measure the calendar-query on your end? How slow is it actually?

bosim commented 6 years ago

I saw some outputs of around 7 seconds, would expect it would take 1 if it just called getCalendarObjects, instead of calendarQuery and then N getCalendarObject but because of the time span it needs calendarQuery. My observations is just more general, that calendarQuery is made ineffiently, a minor change in the behavior (returning same data as getCalendarObjects and not just the URI) would speed up some factor.

evert commented 6 years ago

calendarQuery and family are quite complex, and I've spent a fair amount of time optimizing specific paths that clients would use. It's entirely possible that clients have changed since I've done this and this requires another look.

It's going to be hard to optimize every path, but it's worth doing for common/popular clients. Can you tell me anything more about what's making those calls?

bosim commented 6 years ago

@evert It is mainly Evolution that gives me the headache. It does it to check if anything new appear. I do not think there is anything wrong in a query like the one I provided, if given with a time span (which it also does). Out of curiosity why does calendarQuery return an array of urls and not the same format as getCalendarObjects does.

evert commented 6 years ago

Part of the reason calendarQuery does that, is because there's several layers of business logic that may or may not need to be executed for getCalendarObject (and family of functions).

By fetching things simply by their uri, and using a common API for this, we can be sure that all plugins can behave the same way.

However, there is an optimization possible and I just noticed we haven't implemented this. calendarQuery should not fetch objects one at a time. Here:

https://github.com/sabre-io/dav/blob/master/lib/CalDAV/Plugin.php#L629

It used getPropertiesForPath(), which will indeed result in 1 getCalendarObject per item. Ideally this gets changed to use this method instead:

https://github.com/sabre-io/dav/blob/master/lib/DAV/Server.php#L1016

bosim commented 6 years ago

@evert Thanks for the explanation. However, your optimization idea would not give the big impact for me, since getMultipleCalendarObjects still fetch the calendardata. Just for getting the ETag I do not need that.

staabm commented 6 years ago

It would be great to see a blackfire profile of your 7 second taking request