pimutils / vdirsyncer

📇 Synchronize calendars and contacts.
https://vdirsyncer.pimutils.org/
Other
1.57k stars 163 forks source link

vdirsyncer single file calendar sync's growth of running time is quadratic or higher (it takes "forever") #818

Closed bernhardreiter closed 1 year ago

bernhardreiter commented 4 years ago

Observed: running /usr/bin/vdirsyncer -v debug sync and the process hangs in one thread.

Expected: Sync completes. It did before the move to Debian buster and switch of vdirsyncer version. Or a diagnostic message that help finding the problem, if there is one.

Further observations: an strace shows futex(0x7f387429e890, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY and a py-bt with gdb shows

  File "/usr/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
  File "/usr/lib/python3.7/queue.py", line 170, in get
    self.not_empty.wait()
  File "/usr/lib/python3/dist-packages/click_threading/__init__.py", line 110, in run
    func, future = self.tasks.get()
  File "/usr/lib/python3/dist-packages/vdirsyncer/cli/utils.py", line 375, in join
    ui_worker.run()
  <built-in method next of module object at remote 0x7f387d2bcc28>
  File "/usr/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/usr/lib/python3/dist-packages/vdirsyncer/cli/__init__.py", line 154, in sync
    wq.spawn_worker()
[..]

[pair my_calendar] a = "my_calendar_remote" b = "my_calendar_local" collections = null conflict_resolution = "a wins"

[storage my_calendar_remote] type = "caldav" read_only = true url = "https://posteo.de:8443/calendars/bernh/default/" username = XXX@posteo.de" password = "YYY"

[storage my_calendar_local] type = "singlefile" path = "~/.vdirsyncer/my_calendar.ics"

* the last line of -v debug is

debug: Already normalized: '/calendars/bernh/default/XXXXX.ics'



* Related reports: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959097 with more details.
bernhardreiter commented 4 years ago

Removing .vdirsyncer and doing a new discover and sync works, but the next sync hangs again.

bernhardreiter commented 4 years ago

Meanwhile I can reproduce this just with singlefile one way sync. Some .ics files work, others don't. Next step is to produce a problem case that can be published.

install vdirsyncer "The clean, easy way"

set base /home/bernhardreiter/vtest
PIPX_HOME=$base PIPX_BIN_DIR=$base/bin pipx install vdirsyncer
# gets me   package vdirsyncer 0.16.7, Python 3.7.3
pushd $base

setup singlefile -> singlefile one way config

mkdir problem-1
echo '[general]
status_path = "$base/problem-1"

[pair my_calendar]
a = "my_calendar_a"
b = "my_calendar_b"
collections = null
conflict_resolution = "a wins"

[storage my_calendar_a]
type = "singlefile"
path = "$base/problem-1/my_calendar_a.ics"

[storage my_calendar_b]
type = "singlefile"
path = "$base/problem-1/my_calendar_b.ics"
' | sed -e "s|\$base|$base|" >problem-1/config

testing a calendar

rm problem-1/my*
cp my_calendar_bad.ics problem-1/my_calendar_a.ics
touch problem-1/my_calendar_b.ics
./bin/vdirsyncer -v DEBUG -c problem-1/config discover
./bin/vdirsyncer -v DEBUG -c problem-1/config sync
bernhardreiter commented 3 years ago

Doing more tests, even applying delta debugging I found that it is not a hang but a more then linearithmic grows of running time for the number of calendar items synced in a single file.

A calendar with 1822 items needed 4:26 (4 hours and 26 minutes) to sync. That can easily be be mistaked for a "hang". :)

Conclusion: something about the implemented algorithm is not good, as this is a high order of growth on the input files.

Details

As it was a calendar with personal data, I cannot publish it. As I expected something in the contents itself, I've tried to reduce the problem case first.

Below is the code that worked from a vdir directory of the original data and a text file that had all the filenames in it to run delta debugging. Using delta debugging was not useful, as all syncs were successful, so the timeout test was wrong. However this may be a template for other tests.

#!/usr/bin/env python3
"""Test if vdirsyncer fails to sync on a list of iCalender files.

File names are hardcoded.
This can be used as script to test if a set is interesting when applying
deltadebugging. Used with https://github.com/dsw/delta

./delta-release-2020-06-22/delta -test=tryentries4.py -in_place entries

SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: 2021 Intevation GmbH <https://intevation.de>
Author: <bernhard.reiter@intevation.de>
"""
import os
from pathlib import Path
import shutil
import subprocess
import sys
import time
import tempfile

entries = []

for line in open("entries", "rt"):
    entries.append(line[:-1])

# print(len(entries))
# print(repr(entries[-1]))

def write_config(target_dir):
    template = """
[general]
status_path = "{target_dir}/status/"

[pair my_calendar]
a = "my_calendar_a"
b = "my_calendar_local"
collections = null
conflict_resolution = "a wins"

[storage my_calendar_a]
type = "filesystem"
fileext = ""
read_only = true
path = "{target_dir}/vdir"

[storage my_calendar_local]
type = "singlefile"
path = "{target_dir}/my_calendar_b.ics"
"""
    # print(template.format(target_dir=target_dir.resolve()))
    (target_dir / "config").write_text(
        template.format(target_dir=target_dir.resolve()))

    # create an empty iCalender file
    (target_dir / "my_calendar_b.ics").touch()

def runtest(original_dir, target_dir, entries):
    org_path = original_dir / "vdir"
    vdir_path = target_dir / "vdir"
    vdir_path.mkdir(parents=True)

    for entry in entries:
        shutil.copy(org_path / entry, vdir_path)

    write_config(target_dir)

    cmd_basic =["/home/bern/vdirsyncer/venv/bin/vdirsyncer",
                "-c",  (target_dir / "config").resolve()]

    p = subprocess.Popen(cmd_basic + ["discover"])
    print(f"Running {p.args}")
    p.wait(3)

    p = subprocess.Popen(cmd_basic +["-v", "WARNING", "sync"])
    print(f"Running {p.args}")
    p.wait(8)

    p = subprocess.Popen(cmd_basic +["sync"])
    starttime = time.perf_counter()
    print(f"Running {p.args}")
    try:
        p.wait(36000)
    except subprocess.TimeoutExpired:
        stoptime = time.perf_counter()
        elapsed = int(stoptime - starttime)
        print("Killing {!r} after {:d} seconds".format(p, elapsed))
        p.kill()
        return False
    stoptime = time.perf_counter()
    elapsed = int(stoptime - starttime)
    print("Sync completed after {:d} seconds".format(elapsed))
    return True

#result = runtest(Path("./fetch-private2"), Path("./testall"), entries)

with tempfile.TemporaryDirectory() as tempdir_name:
    print("Testing in {}".format(tempdir_name))
    result = runtest(Path("./fetch-private2"), Path(tempdir_name), entries)

if result: # we've failed to find an error, as "delta" expects it.
    sys.exit(1)

Here are some numbers to show the increase of runtime:

WhyNotHugo commented 3 years ago

Can you check if vdirsyncer sync --max-workers 1 fixes the issue?

bernhardreiter commented 3 years ago

Thanks for the hint. I will.

WhyNotHugo commented 3 years ago

Let me know how that goes, I have some theories on what’s going on.

bernhardreiter commented 3 years ago

@WhyNotHugo

if vdirsyncer sync --max-workers 1 fixes the issue?

Letting the big test (t1822m1.py) run for 10 minutes: not terminated. 10 minutes is too long for this simple sync (as there is no change in this test it only needs to compare 1822 iCal objects). So unfortunately: no, not solved by --max-workers=1

WhyNotHugo commented 3 years ago

Can you share the test? I can try and profile it and see if that yields anything useful.

There's clearly a bug here, but I've no idea where.

bernhardreiter commented 3 years ago

Can you share the test?

Unfortunately not, as it is a personal calendar with private data. (This is why I've went with delta debugging first, my hypothesis then was that it would be related to the contents of some iCalender objects.)

liberally licensed calender data set

Do you know any liberarlly licenced larger iCalender data sets? I've tried to find some, but was not successful?

Best candidates after search

Fadiabb commented 3 years ago

I used Icalendar-generator to generate an icalender-file with 2000 items, and tested it with singlefile one way sync. --> it took like 4 minutes and 30 seconds to complete.

config:

[general]
status_path = "~/vtest"

[pair foobar]
a = "foo"
b = "bar"
collections = null
conflict_resolution = "a wins"

[storage foo]
type = "singlefile"
path = "~/vtest/calendar-a/my_calendar_a.ics"

[storage bar]
type = "singlefile"
path = "~/vtest/calendar-b/my_calendar_a.ics"

testing calendar:


cp ~/icalendar-generator/calendar.ics ~/vtest/calendar-a/my_calendar_a.ics
touch ~/vtest/calendar-b/my_calendar_a.ics

vdirsyncer -v debug -c  /vtest/config discover
vdirsyncer -v debug -c  /vtest/config sync
eguiraud commented 1 year ago

Hi, I have the same problem, vdirsyncer sync takes forever.

I'm syncing two calendars from my private Nextcloud instance. After removing all contents of the .config/vdirsyncer/data and .vdirsyncer/status, the first vdirsyncer sync runs to completion without issues (after the appropriate vdirsyncer discover calls). The second runs "forever" (I stopped it after ~15 minutes). I can consistently reproduce this.

Stracing the process I can see that vdirsyncer is "stuck" in a loop at 100% CPU usage:

strace: Process 139392 attached
brk(0x55c0a6fdf000)                     = 0x55c0a6fdf000
brk(0x55c0a7001000)                     = 0x55c0a7001000
brk(0x55c0a7026000)                     = 0x55c0a7026000
brk(0x55c0a7048000)                     = 0x55c0a7048000
brk(0x55c0a7069000)                     = 0x55c0a7069000
brk(0x55c0a708b000)                     = 0x55c0a708b000
brk(0x55c0a70ad000)                     = 0x55c0a70ad000
brk(0x55c0a70d0000)                     = 0x55c0a70d0000
brk(0x55c0a70f3000)                     = 0x55c0a70f3000
brk(0x55c0a7118000)                     = 0x55c0a7118000
brk(0x55c0a7141000)                     = 0x55c0a7141000
brk(0x55c0a7164000)                     = 0x55c0a7164000
brk(0x55c0a7186000)                     = 0x55c0a7186000
brk(0x55c0a71aa000)                     = 0x55c0a71aa000
brk(0x55c0a71cd000)                     = 0x55c0a71cd000
brk(0x55c0a71f0000)                     = 0x55c0a71f0000
brk(0x55c0a700a000)                     = 0x55c0a700a000
brk(0x55c0a6e95000)                     = 0x55c0a6e95000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
brk(0x55c0a6efa000)                     = 0x55c0a6efa000
brk(0x55c0a6fc3000)                     = 0x55c0a6fc3000
close(8)                                = 0
brk(0x55c0a6fe5000)                     = 0x55c0a6fe5000
brk(0x55c0a7007000)                     = 0x55c0a7007000
brk(0x55c0a702c000)                     = 0x55c0a702c000
brk(0x55c0a704d000)                     = 0x55c0a704d000
brk(0x55c0a706f000)                     = 0x55c0a706f000
brk(0x55c0a7091000)                     = 0x55c0a7091000
brk(0x55c0a70b3000)                     = 0x55c0a70b3000
brk(0x55c0a70d6000)                     = 0x55c0a70d6000
brk(0x55c0a70f9000)                     = 0x55c0a70f9000
brk(0x55c0a711e000)                     = 0x55c0a711e000
brk(0x55c0a7146000)                     = 0x55c0a7146000
brk(0x55c0a7169000)                     = 0x55c0a7169000
brk(0x55c0a718c000)                     = 0x55c0a718c000
brk(0x55c0a71af000)                     = 0x55c0a71af000
brk(0x55c0a71d2000)                     = 0x55c0a71d2000
brk(0x55c0a71f5000)                     = 0x55c0a71f5000
brk(0x55c0a7010000)                     = 0x55c0a7010000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
close(8)                                = 0
brk(0x55c0a7031000)                     = 0x55c0a7031000
brk(0x55c0a7053000)                     = 0x55c0a7053000
brk(0x55c0a7074000)                     = 0x55c0a7074000
brk(0x55c0a7097000)                     = 0x55c0a7097000
brk(0x55c0a70b9000)                     = 0x55c0a70b9000
brk(0x55c0a70dc000)                     = 0x55c0a70dc000
brk(0x55c0a70ff000)                     = 0x55c0a70ff000
brk(0x55c0a7123000)                     = 0x55c0a7123000
brk(0x55c0a7147000)                     = 0x55c0a7147000
brk(0x55c0a716f000)                     = 0x55c0a716f000
brk(0x55c0a7192000)                     = 0x55c0a7192000
brk(0x55c0a71b5000)                     = 0x55c0a71b5000
brk(0x55c0a71d9000)                     = 0x55c0a71d9000
brk(0x55c0a700a000)                     = 0x55c0a700a000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
close(8)                                = 0
eguiraud commented 1 year ago

Attaching a debugger (something that will be much harder to do for users if you move to Rust, btw, because of compiler optimizations and potential lack of debug symbols :D ), I see that we never get out of this loop:

https://github.com/pimutils/vdirsyncer/blob/3bf9a3d6840672656b2e9685bc7e7bc50a3e4700/vdirsyncer/sync/__init__.py#L62-L67

and in particular we spend a lot of time (many seconds) for each iteration of the inner loop at the bottom of this callstack:

  /usr/lib/python3.10/site-packages/vdirsyncer/sync/__init__.py(62)prepare_new_status()
     61         if prefetch:
---> 62             async for href, item, etag in self.storage.get_multi(prefetch):
     63                 _store_props(

  /usr/lib/python3.10/site-packages/vdirsyncer/storage/base.py(162)get_multi()
    161         for href in uniq(hrefs):
--> 162             item, etag = await self.get(href)
2   163             yield href, item, etag

> /usr/lib/python3.10/site-packages/vdirsyncer/storage/singlefile.py(127)get()
    126         if self._items is None or not self._at_once:
--> 127             async for _ in self.list():
3   128                 pass
eguiraud commented 1 year ago

And here is a flamegraph that clearly shows where time is spent. If you download the SVG and open the downloaded file in a browser it's actually interactive.

vdirsyncer.svg.zip

vdirsyncer

eguiraud commented 1 year ago

I don't know vdirsyncer internals well enough to judge whether we could completely avoid the hashing, or at least provide a faster (compiled?) implementation.

Since parse in vobject.py is taking ~50% of the runtime, I checked whether there was some easy win there. Here's the line-by-line profile, I can't see anything obvious that could be improved to reduce the runtime of parse significantly:

File: /home/blue/vdirsyncer_dbg_env/lib/python3.10/site-packages/vdirsyncer/vobject.py
Function: parse at line 266

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   266                                               @classmethod
   267                                               @profile
   268                                               def parse(cls, lines, multiple=False):
   269      8166       4055.8      0.5      0.1          if isinstance(lines, bytes):
   270                                                       lines = lines.decode("utf-8")
   271      8166       1936.1      0.2      0.0          if isinstance(lines, str):
   272      8166     162844.5     19.9      2.5              lines = lines.splitlines()
   273
   274      8166       1430.3      0.2      0.0          stack = []
   275      8166       1135.2      0.1      0.0          rv = []
   276      8166       1096.1      0.1      0.0          try:
   277   5649332    1086100.1      0.2     16.6              for _i, line in enumerate(lines):
   278   4995192    1157565.0      0.2     17.7                  if line.startswith("BEGIN:"):
   279    654139     266100.6      0.4      4.1                      c_name = line[len("BEGIN:") :].strip().upper()
   280    654139     471390.6      0.7      7.2                      stack.append(cls(c_name, [], []))
   281   4341055    1034122.4      0.2     15.8                  elif line.startswith("END:"):
   282    654137     126078.4      0.2      1.9                      component = stack.pop()
   283    645972      77530.0      0.1      1.2                      if stack:
   284    645972     175547.5      0.3      2.7                          stack[-1].subcomponents.append(component)
   285                                                               else:
   286      8165       2011.3      0.2      0.0                          rv.append(component)
   287                                                           else:
   288   4341055     777734.7      0.2     11.9                      if line.strip():
   289   4341055    1194707.5      0.3     18.2                          stack[-1].props.append(line)
   290         1          1.1      1.1      0.0          except IndexError:
   291                                                       raise ValueError(f"Parsing error at line {_i + 1}")
   292
   293      8145       1168.9      0.1      0.0          if multiple:
   294        20          3.0      0.1      0.0              return rv
   295      8145       2363.2      0.3      0.0          elif len(rv) != 1:
   296                                                       raise ValueError(f"Found {len(rv)} components, expected one.")
   297                                                   else:
   298      8145       1502.3      0.2      0.0              return rv[0]
WhyNotHugo commented 1 year ago

I don't know vdirsyncer internals well enough to judge whether we could completely avoid the hashing, or at least provide a faster (compiled?) implementation.

For the singlefile backend, if a single component in a very large file changes, it's very hard to determine which component has changed. For the vdir backend, each component is a separate file, with its separate mtime, so we can just check that (plus the inode number).

For singlefile there's no obvious equivalent, so we use its hash of each one to determine if it has changed since the last sync or not. I don't think there's any obvious way around this.

However, if we're hashing any item one more than once per run, that's a bug a can be optimized away. That aside, cheaper hashing for this use-case might make sense.

There's also a lot of string copying/concat going on in conversions. That can probably be improved, but I've no concretes ideas on how.

eguiraud commented 1 year ago

Mmh ok I see three suspicious things happening that together might explain the sync taking forever. @WhyNotHugo please let me know what you think.

1. Storage.get_multi contains the following lines:

   for href in uniq(hrefs): # there are 756 hrefs in my case
       item, etag = await self.get(href)
       yield href, item, etag

This in turn calls SingleFileStorage.get which again loops over 756 elements:

   async for _ in self.list(): # also 756 elements!
      pass

2. SingleFileStorage.get loops over 756 elements just to return one of them, every time. Not sure why it doesn't only produce the element it needs to return instead:

    async def get(self, href):
        if self._items is None or not self._at_once:
            async for _ in self.list():
                pass

        try:
            return self._items[href]

3. I waited until the 756*756 loop was over, and then vdirsyncer started re-syncing back every event to my Nextcloud calendar:

Copying (updating) item 0022f6bb-f1ec-4b52-88d7-0dfb608914b5 to events_personal_remote
Copying (updating) item 00745806-51c1-435a-98dd-c9dca085c641 to events_personal_remote
Copying (updating) item 00910f46-2053-42ad-abac-a5013dd51b46 to events_personal_remote
Copying (updating) item 00abeab0-e782-4f5b-8d93-c04a24261c43 to events_personal_remote

which it really shouldn't do because there is nothing to update: those events had just been copied from the calendar during the first sync (the one that completes quickly).

WhyNotHugo commented 1 year ago

This in turn calls SingleFileStorage.get which again loops over 756 elements:

Check closely: it only does the iteration if self_items is None. self._items is a cache of item. This loop should only execute once because self.list() populates self._items.

These lines:

        if self._items is None or not self._at_once:
            async for _ in self.list():
                pass

Are an ugly way of forcing the cache to get populated.

eguiraud commented 1 year ago

it's really executed every time in my case, because self._at_once is False

EDIT: self._at_once is False here because sync() calls prepare_new_status() without using the at_once() context manager, here: https://github.com/pimutils/vdirsyncer/blob/54a90aa5dd34fb3583965535be82b0a552f18c7d/vdirsyncer/sync/__init__.py#L143-L146

A few lines later, when actions are run, at_once() is turned on: https://github.com/pimutils/vdirsyncer/blob/54a90aa5dd34fb3583965535be82b0a552f18c7d/vdirsyncer/sync/__init__.py#L155-L158

WhyNotHugo commented 1 year ago

I think we can override the get_many method for this storage type and use an at_once() context manager inside of it.

WhyNotHugo commented 1 year ago

Basically, this bit should be wrapped in at_once():

   for href in uniq(hrefs): # there are 756 hrefs in my case
       item, etag = await self.get(href)
       yield href, item, etag
eguiraud commented 1 year ago

Alright, I'll give it a go after I create another calendar for this purpose, because problem number 3. here makes me wary of running vdirsyncer sync to completion on my actual personal calendar :sweat_smile:

WhyNotHugo commented 1 year ago

Alright, I'll give it a go after I create another calendar for this purpose, because problem number 3.

It would be useful to see both versions and see what changed. It might be that the server normalised into a semantically-equivalent format.

Do any of your entries have more than one PRODID field?

eguiraud commented 1 year ago

The patch proposed at https://github.com/pimutils/vdirsyncer/pull/1051 seems to fix points 1. and 2.

eguiraud commented 1 year ago

It would be useful to see both versions and see what changed.

Do any of your entries have more than one PRODID field?

This seems to be triggered by events that I add to the calendar from my phone caldav client, not with events that I create directly from the Nextcloud web interface.

Here is the diff before and after vdirsyncer updates an event that it should have no business updating:

~/Downloads diff -u 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2.ics 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2\(1\).ics
--- 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2.ics    2023-03-09 18:35:23.560070893 -0600
+++ 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2(1).ics 2023-03-09 18:36:15.013387075 -0600
@@ -1,19 +1,9 @@
 BEGIN:VCALENDAR
 VERSION:2.0
+PRODID:-//IDN nextcloud.com//Calendar app 3.4.2//EN
 PRODID:DAVx5/4.3-ose ical4j/3.2.7 (ws.xsoh.etar)
-BEGIN:VEVENT
-DTSTAMP:20230310T003500Z
-UID:67cbcb15-d064-4f63-aa9b-38ccf7ef50c2
-SUMMARY:Test event 3
-DTSTART;TZID=America/Guatemala:20230311T000000
-DTEND;TZID=America/Guatemala:20230311T010000
-STATUS:CONFIRMED
-BEGIN:VALARM
-TRIGGER:-PT10M
-ACTION:DISPLAY
-DESCRIPTION:Test event 3
-END:VALARM
-END:VEVENT
+PRODID:-//Sabre//Sabre VObject 4.4.1//EN
+CALSCALE:GREGORIAN
 BEGIN:VTIMEZONE
 TZID:America/Guatemala
 LAST-MODIFIED:20221105T024526Z
@@ -27,4 +17,17 @@
 RDATE:20061001T000000
 END:STANDARD
 END:VTIMEZONE
+BEGIN:VEVENT
+DTSTAMP:20230310T003500Z
+UID:67cbcb15-d064-4f63-aa9b-38ccf7ef50c2
+SUMMARY:Test event 3
+DTSTART;TZID=America/Guatemala:20230311T000000
+DTEND;TZID=America/Guatemala:20230311T010000
+STATUS:CONFIRMED
+BEGIN:VALARM
+TRIGGER:-PT10M
+ACTION:DISPLAY
+DESCRIPTION:Test event 3
+END:VALARM
+END:VEVENT
 END:VCALENDAR

Maybe interestingly vdirsyncer adds 2 extra PRODID lines.

WhyNotHugo commented 1 year ago

So you created this on the phone, vdirsyncer downloaded it, and the next sync it was trying to create it?

Looks like it's mostly shuffling the order of the items. This is due to how singlefile works. vtimezones are shared across ALL events in the file, so they're in another order, and when "splitting" to sync, they end up in another order. itś mostly harmless.

I can only think of extremely complicated fixes right now.

The PRODID duplication is a bug of the singlefile storage and needs to be fixed.

eguiraud commented 1 year ago

you created this on the phone, vdirsyncer downloaded it, and the next sync it was trying to create it?

yes, the next sync it updated it as above. for some reason, events created via the web interface are already like vdirsyncer wants them, so they are left alone.

the problem with the mostly harmless update is that it takes forever if the calendar has hundreds or thousands of such events.

bernhardreiter commented 1 year ago

Thanks for looking into the issue. I've skimmed your new messages. Just one remark, about my old findings:

found that it is not a hang but a more then linearithmic grows of running time for the number of calendar items synced in a single file.

(from https://github.com/pimutils/vdirsyncer/issues/818#issuecomment-844085698)

At least I could prove by delta debugging that there haven't been a bad calender entry in my test data triggering a defect and all tests terminated (but some only after hours). HTH

WhyNotHugo commented 1 year ago

1051 has been merged. Can you check how performance is when using current main branch?

bernhardreiter commented 1 year ago

Can you check how performance is when using current main branch?

Testing vdirsyncer, version 0.19.2.dev6+g079a156 (current main). The performance is much better compared to 0.19.1.

Growth of running time seems to be more linear then before. I consider the defect fixed and hope for the next release to appear! :)

Thanks again for maintaining vdirsyncer and for all that have helped to fix the issue. (I'm resolving this issue as help to the maintainer.)

details of testing

Using an old version of https://cantonbecker.com/astronomy-calendar/ with 411 entries:

grep SUMM astrocal-20210525.ics | wc -l
411
/home/ber/.local/bin/vdirsyncer --version
vdirsyncer, version 0.19.1
time /home/ber/.local/bin/vdirsyncer -v DEBUG -c problem-1/config sync
real    0m21,845s

git clone https://github.com/pimutils/vdirsyncer.git vdirsyncer-git
cd vdirsyncer-git/
pip install -e .
/home/ber/.local/bin/vdirsyncer --version
vdirsyncer, version 0.19.2.dev6+g079a156
time /home/ber/.local/bin/vdirsyncer -v DEBUG -c problem-1/config sync
real    0m0,780s

Using https://gitlab.com/FadiAbb/icalendar-generator to generate from 2023-04-01 with number of entries, v0.19.1. and main branch timings:

822
real    0m37,624s
real    0m0,989s

1644
real    2m32,770s
real    0m1,424s
WhyNotHugo commented 1 year ago

Thanks for confirming!