nextcloud / calendar

📆 Calendar app for Nextcloud
https://apps.nextcloud.com/apps/calendar
GNU Affero General Public License v3.0
950 stars 235 forks source link

Retry import request on a 503 / limit number of concurrent import requests #445

Closed mapausch closed 4 years ago

mapausch commented 7 years ago

Steps to reproduce

  1. export all Calendar entries from OwnCloud into a file
  2. import the calendar file with web-ui into Nextcloud calendar

Expected behaviour

All calendar entries are imported successfully

Actual behaviour

Calendar file gets imported partially only. Import procedure makes a lot of parallel HTTP PUT requests but webhoster allows only 8 parallel processes per customer in shared hosting environment. further information can be found here: https://help.nextcloud.com/t/import-of-vcalendar-file-from-owncloud-works-partially-only/12300

Server configuration

Operating system: Linux cloud5-vm382 4.4.67+134-ph #1 SMP Wed May 10 22:20:46 UTC 2017 x86_64

Web server: Apache (cgi-fcgi)

Database: mysql 5.6.35

PHP version: 5.6.30 Modules loaded: Core, date, ereg, libxml, openssl, pcre, sqlite3, zlib, bcmath, calendar, ctype, curl, dom, hash, fileinfo, filter, ftp, gd, gettext, gmp, SPL, iconv, session, intl, json, mbstring, mcrypt, standard, mysqlnd, pcntl, PDO, pdo_mysql, pdo_sqlite, Phar, posix, Reflection, imap, SimpleXML, soap, sockets, mysqli, exif, tidy, tokenizer, xml, xmlreader, xmlrpc, xmlwriter, xsl, zip, mysql, cgi-fcgi, mhash

Nextcloud version: 11.0.3 (stable) - 11.0.3.2

Updated from an older Nextcloud/ownCloud or fresh install: --> fresh install

Where did you install Nextcloud from: --> Web Installer

Signing status:

Signing status ``` { "core": { "INVALID_HASH": { ".htaccess": { "expected": "11e2db30f0cf23df1b5aa1cdf329a8c88d253f86e43f9e7af1b30969eb0175030103b138e2f7ab7608c902bbb57a5d578c2c0ca09f3abf2ef83415f4bc6f6e20", "current": "e50777a123d75e6581c06aeae3b2d794356005e548b2f77ebe29539366ec632058705425d694b433d58cfa4cbcfec3e19eab54586b9842c6ef9061330552d8c2" } } } } ```

List of activated apps:

App list ``` Enabled: - activity: 2.4.1 - calendar: 1.5.2 - comments: 1.1.0 - contacts: 1.5.3 - dav: 1.1.1 - federatedfilesharing: 1.1.1 - federation: 1.1.1 - files: 1.6.1 - files_pdfviewer: 1.0.1 - files_sharing: 1.1.1 - files_texteditor: 2.2 - files_trashbin: 1.1.0 - files_versions: 1.4.0 - files_videoplayer: 1.0.0 - firstrunwizard: 2.0 - gallery: 16.0.0 - issuetemplate: 0.2.1 - logreader: 2.0.0 - lookup_server_connector: 1.0.0 - nextcloud_announcements: 1.0 - notifications: 1.0.1 - password_policy: 1.1.0 - provisioning_api: 1.1.0 - serverinfo: 1.1.1 - sharebymail: 1.0.1 - survey_client: 0.1.5 - systemtags: 1.1.3 - theming: 1.1.1 - twofactor_backupcodes: 1.0.0 - updatenotification: 1.1.1 - workflowengine: 1.1.1 Disabled: - admin_audit - encryption - external - files_accesscontrol - files_automatedtagging - files_external - files_retention - templateeditor - user_external - user_ldap - user_saml ```

The content of config/config.php:

Config report ``` { "instanceid": "ocm7mg479frp", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "***REMOVED SENSITIVE VALUE***", "***REMOVED SENSITIVE VALUE***" ], "datadirectory": "\***REMOVED SENSITIVE VALUE***", "overwrite.cli.url": "***REMOVED SENSITIVE VALUE***", "dbtype": "mysql", "version": "11.0.3.2", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "127.0.0.1:3307", "dbport": "", "dbtableprefix": "oc_", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "logtimezone": "UTC", "installed": true, "mail_smtpmode": "php", "mail_from_address": "ncp", "mail_domain": "***REMOVED SENSITIVE VALUE***", "maintenance": false, "theme": "", "loglevel": 2 } ```

Are you using external storage, if yes which one: files_external is disabled

Are you using encryption: no

Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...

Client configuration

Browser: Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0

Operating system:

Logs

Web server error log

Web server error log ``` Insert your webserver log here ```

Nextcloud log (data/nextcloud.log)

Nextcloud log ``` Insert your Nextcloud log here ```

Browser log

Browser log ``` Insert your browser log here, this could for example include: a) The javascript console log b) The network log c) ... ```
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/45135825-retry-import-request-on-a-503-limit-number-of-concurrent-import-requests?utm_campaign=plugin&utm_content=tracker%2F45525646&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F45525646&utm_medium=issues&utm_source=github).
georgehrke commented 7 years ago

Is there no way to convince your hoster to allow more requests?

To be honest I don't really see a reason to make the import slower for everyone because one user is using a hoster with weird policies.

mapausch commented 7 years ago

Maybe 8 parallel processes is less compared to other hosters. But I think capping resources (number of processes, runtime of processes, RAM usage, ...) is something that the most hosters do. Especially where more customers share a server.

I share your opinion that parallelization is the right way to get the import done faster. But importing a n-hundred calendar entries with a n-hundred parallel requests is far away from being n-times faster than importing them one by one. All import threads have to share the same (transaction oriented) database. And latest at the database level there is some kind of serialization that "slows" down the highly-parallelized import procedure.

So a (ideally configurable) limit of parallel threads would be a good approach. And I am sure the most users would prefer a slightly longer import process than get bothered with error messages that say: "Oops, I just was able to import some of your stuff. Now leave me alone". And nobody wants to to compare the imported entries with the export file to check what entries need to be imported manually.

Another approach would be to retry import for valid entries that couldn't be imported. Retry maybe could be based on the return code of the origin request.

So a little more fault tolerance just would make the/your product a little more user friendly, IMHO :-)

georgehrke commented 7 years ago

All import threads have to share the same (transaction oriented) database. And latest at the database level there is some kind of serialization that "slows" down the highly-parallelized import procedure.

Databases are rather fast. Not sure that's really the bottleneck here ;)

And I am sure the most users would prefer a slightly longer import process than get bothered with error messages that say: "Oops, I just was able to import some of your stuff. Now leave me alone". And nobody wants to to compare the imported entries with the export file to check what entries need to be imported manually.

We need to improve the import error messages and ideally display the name and time of the event. I'm with you on that one.

Another approach would be to retry import for valid entries that couldn't be imported. Retry maybe could be based on the return code of the origin request.

retrying the request on a 503 might be something we could do. Not a priority at the moment though

mapausch commented 7 years ago

Assuming that databases are "fast" is something I wouldn't rely on. In fact database also can be overstressed (temporarily or permanent) and so can cause delays in importing a single row/calendar entry/address book record/you name it/...

And if the webserver would allow more than 8 parallel processes than you also will reach the higher process limit when database is not fast enough to complete the requests before the process limit on the webserver is reached. You will reach this point later, but you will reach it. And what will be the result? User will see that Nextcloud is not able to import the file and will stay with ownCloud ;-)

On the web you will find thousands of diagrams like this (http://www.toadworld.com/cfs-file/__key/communityserver-wikis-components-files/00-00-00-00-03/a6.png) that shows that also parallelisation has it's limits because on some point in the curve the performance will not increase any more even when you increase Degree Of Parallelism.

As you can see on my example: the unlimited(?) DOP in the part of the software that does the calender import also can have negative effects. Result? Bad user experience.

I appreciate your decision to retry import for requests that get a 503. It's much better than just showing the user a list of entries that couldn't be imported. Because this would again leave the user alone.

But maybe you also should consider a limit of parallelism. My file just has about 250 events/alarms/etc. And I am now using ownCloud for not even a year. What if somebody has a file with 500/1000/2000 entries because he already uses oC for a longer period? Do you create 500/1000/2000 parallel requests? Can this huge amount of parallel requests cause to reach any limits in the browser and make browser crash? Does every webserver return a 503 when limits are reached? Or can there be other reactions?

Valicanto commented 6 years ago

I think I have the same problem. I exported a calendar with almost 3000 events from Nextcloud 9.0.53. I then did a clean install of Nextcloud 12.0.3 on my BananaPi with Ubuntu 14.04 and Nginx. Everytime i try to import the ICS file I get the message ""Partially imported, 2000 failures)" (Everytime it is a different amount of failures, but always around 2000.)

Console in browser says: NetworkError: 502 Bad Gateway /var/log/php7.0-fpm.log says: WARNING: [pool www] server reached pm.max_children setting (5), consider raising it.

I raised: pm.max_children = 5000 pm.start_servers = 2 pm.min_spare_servers = 1 pm.max_spare_servers = 30 pm.max_requests = 5000

I still get error messages like: WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 188 total children

Has this anything to do with the parallel processes discussed in this topic? If yes, which settings in which files do I have to adjust?

mapausch commented 5 years ago

FYI, the contacts app will have some mechanism in release 3.0 for throttling mass import of contacts (see https://github.com/nextcloud/contacts/issues/465) Maybe it makes sense to implement a similar logic in te contacts app?

laszlovl commented 5 years ago

I experienced the same problem. Importing a large ICS file with thousands of requests was impossible, because NextCloud would fire thousands of XHR requests simultaneously and my webserver rejected most of them.

This wouldn't have been much of a problem in the past, because with HTTP1 browsers implement a low (2-8) cap of maximum simultaneous connections that can be made to the same host. In that case, the browser itself acts like a rate limiter by only accepting 2-8 simultaneous import requests and queueing the others. But with HTTP2 enabled webservers, there is no such low limit anymore and browsers will allow hundreds of simultaneous connections to the same host.

That's fine for static resources or event-based application servers like NodeJS, but not for PHP where each request is handled in a separate thread. The suggestion in this thread to increase your max_children to a value of 5000 is very bad, because this would potentially spawn 5000 separate php-fpm threads on your server. I'm sure 99% of the servers running Nextcloud do not have sufficient CPU and RAM to handle such an event, so your server would become irresponsive and the OOM killer would start killing all these PHP threads in the middle of handling requests.

I wrote a simple change to ratelimit the calendar imports to 1 request per 100 milliseconds: https://github.com/laszlovl/calendar/commit/eb86004bc846edbf98b75d6f35199e9bc8452fe1. But it looks like the Calendar app is currently being rewritten from scratch using Vue (and the new import code is already written), so it's probably futile to submit it as a pull request at this stage.

gloomytree commented 5 years ago

Here to let you know i have the same issue. I am migrating my gmail calendar to nextcloud ( ~ 800 entries) and the import keeps failing. Unfortunately there is no way to ensure that every event was processed, even after several uploads.

Everything imported fine, after setting the throttling in firefox to "GPRS", but that cannot be a final solution. Critical processes should be resilient, no matter on the servers performance, an I feel the integrity of my personal calendar is really critical to me.

I don't mean to rant or criticize, I just hope that this becomes a priority in the future. The calendar app is really awesome and i appreciate all the effort you folks are putting in.

jensfriisnielsen commented 4 years ago

As gloomytree I am trying to escape from Google (to a self-hosted solution), but this issue is a showstopper. I would rather wait for as long as it takes, instead of having ~80% random failures, much like mapausch initially described.

I am pasting my nginx error log for the search robots.

FATAL:  sorry, too many clients already in /var/www/nextcloud/lib/private/DB/Connection.php:64
Stack trace:
#0 /var/www/nextcloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(448): OC\DB\Connection->connect()
#1 /var/www/nextcloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(410): Doctrine\DBAL\Connection->getDatabasePlatformVersion()
#2 /var/www/nextcloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(354): Doctrine\DBAL\Connection->detectDatabasePlatform()
#3 /var/www/nextcloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(710): Doctrine\DBAL\Connection->getDatabasePlatform()
#4 /var/www/nextcloud/lib/private/DB/Connection.php(151): Doctrine\DBAL\Connection->setTrans...
PHP message: PHP Fatal error:  Uncaught Doctrine\DBAL\DBALException: Failed to connect to the database: An exception occurred in driver: SQLSTATE[08006] [7] could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?