Fix _get_resources_by_hrefs: href may be None

Tcc100 commented 9 months ago

href may be None for which this function (_get_resources_by_hrefs) raises an error.

I observed the error using merkuro/akonadi.

jelmer commented 9 months ago

Do you have an example request that triggers this (see also notes/debugging.rst)? From what I can tell, that would be an invalid request (and thus a bug in akonadi, though xandikos should obviously also be more robust)

Tcc100 commented 9 months ago

xandikos: 0.2.10 akonadi: 23.08.2

First Request:

REPORT /cal/user/calendars/calendar/ HTTP/1.1
Content-Length: 416
Depth: 1
Content-Type: text/xml; charset=utf-8

<?xml version="1.0" encoding="utf-8"?>
<calendar-query xmlns="urn:ietf:params:xml:ns:caldav">
 <prop xmlns="DAV:">
  <getetag xmlns="DAV:"/>
  <resourcetype xmlns="DAV:"/>
 </prop>
 <filter xmlns="urn:ietf:params:xml:ns:caldav">
  <comp-filter xmlns="urn:ietf:params:xml:ns:caldav" name="VCALENDAR">
   <comp-filter xmlns="urn:ietf:params:xml:ns:caldav" name="VEVENT"/>
  </comp-filter>
 </filter>
</calendar-query>

Part of response. There is one outlier not starting with /cal/.

  <ns0:response>
    <ns0:href>/cal/user/calendars/calendar/9feec0eb-d6f0-4167-896a-0497054b203d.ics</ns0:href>
    <ns0:propstat>
      <ns0:status>HTTP/1.1 200 OK</ns0:status>
      <ns0:prop>
        <ns0:getetag>"074f197cc74940dcce959d84c56b4adaeb1e5e44"</ns0:getetag>
        <ns0:resourcetype/>
      </ns0:prop>
    </ns0:propstat>
  </ns0:response>
  <ns0:response>
    <ns0:href>UID%3Axxxxxxx-xxx00-XXXXXX%40xxx.xxxxx.xx.ics</ns0:href>
    <ns0:propstat>
      <ns0:status>HTTP/1.1 200 OK</ns0:status>
      <ns0:prop>
        <ns0:getetag>"489d3250104ac92cab5f667d3714af369099c6e8"</ns0:getetag>
        <ns0:resourcetype/>
      </ns0:prop>
    </ns0:propstat>
  </ns0:response>
  <ns0:response>
    <ns0:href>/cal/user/calendars/calendar/a0167cc7-9ae3-4e7e-815a-d72c2db1eaf4.ics</ns0:href>
    <ns0:propstat>
      <ns0:status>HTTP/1.1 200 OK</ns0:status>
      <ns0:prop>
        <ns0:getetag>"199077b531b62ad3a99b2d176c5da0ab37cda8b9"</ns0:getetag>
        <ns0:resourcetype/>
      </ns0:prop>
    </ns0:propstat>
  </ns0:response>

Follow up request that triggers the error. The suspicious href is now empty.

REPORT /cal/user/calendars/calendar/ HTTP/1.1
Content-Length: 26660
Depth: 0
Content-Type: text/xml; charset=utf-8

<?xml version="1.0" encoding="utf-8"?>
<calendar-multiget xmlns="urn:ietf:params:xml:ns:caldav">
 <prop xmlns="DAV:">
  <getetag xmlns="DAV:"/>
  <calendar-data xmlns="urn:ietf:params:xml:ns:caldav"/>
 </prop>

...

<href xmlns="DAV:">/cal/user/calendars/calendar/9feec0eb-d6f0-4167-896a-0497054b203d.ics</href>
<href xmlns="DAV:"></href>
<href xmlns="DAV:">/cal/user/calendars/calendar/a0167cc7-9ae3-4e7e-815a-d72c2db1eaf4.ics</href>

...

</calendar-multiget>

I hope this helps, I currently have not the time to setup a minimal/clean reproducer.

jelmer commented 9 months ago

Thanks, that's helpful. It looks like there's at least a bug in xandikos being inconsistent about returning absolute paths that we should also fix.

I'm a little hesitant to just ignore entries with an empty or missing href, because it's likely a sign of bugs like this. Perhaps it's best to only do this if --no-strict is specified and return an error to the client otherwise.

Tcc100 commented 9 months ago

I found that the urljoin function as used here does not handle colons as desired.

>>> import urllib.parse
>>> urllib.parse.urljoin("/cal/user/calendars/calendar/", "X:foo.ics")
'X:foo.ics'

I wonder if urljoin is really the right choice at this point. Is dropping part of the base href ever desired?

>>> urllib.parse.urljoin("/part1/part2/part3_without_slash", "child")
'/part1/part2/child'

A better solution might be:

child_href = ensure_trailing_slash(href) + child_name

jelmer commented 9 months ago

I found that the urljoin function as used here does not handle colons as desired.
>>> import urllib.parse
>>> urllib.parse.urljoin("/cal/user/calendars/calendar/", "X:foo.ics")
'X:foo.ics'
I wonder if urljoin is really the right choice at this point. Is dropping part of the base href ever desired?
>>> urllib.parse.urljoin("/part1/part2/part3_without_slash", "child")
'/part1/part2/child'
A better solution might be:
child_href = ensure_trailing_slash(href) + child_name

Ah, I see. I think the root is actually that we should escape child_name because it's used in a URL and we can't include just any characters there (colons should be escaped). This also appears to be the root cause for #273; #253 should fix this.

jelmer commented 9 months ago

Cherry-picked your other change.

jelmer / xandikos

Fix _get_resources_by_hrefs: href may be None #273