readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
8.04k stars 3.58k forks source link

Build: exception with build while rclone sync is running #11544

Open sentry-io[bot] opened 2 months ago

sentry-io[bot] commented 2 months ago

A user noticed a build failure which seems to be the repository contents causing some error with rclone:

Sentry Issue: READTHEDOCS-ORG-RYA

CalledProcessError: Command '['rclone', 'sync', '--transfers=8', '--checksum', '--verbose', '--retries=3', '--retries-sleep=1s', '--', '/home/docs/checkouts/readthedocs.org/user_builds/bt-tools/checkouts/latest/_readthedocs/html', ':s3:readthedocs-media-prod/html/bt-tools/latest']' returned non-zero exit status 1.
  File "readthedocs/projects/tasks/builds.py", line 946, in store_build_artifacts
    build_media_storage.rclone_sync_directory(from_path, to_path)
  File "readthedocs/builds/storage.py", line 141, in rclone_sync_directory
    return self._rclone.sync(source, destination)
  File "readthedocs/storage/rclone.py", line 124, in sync
    return self.execute("sync", args=[source, self.get_target(destination)])
  File "readthedocs/storage/rclone.py", line 101, in execute
    result = subprocess.run(

Error copying to storage

I tried this locally, but didn't see much more in the ways of stderr or rclone error output, so we will need to dig a little deeper into the command execution.

Front logo Front conversations

agjohnson commented 2 months ago

Here is a repository that reliable triggers this exception:

https://github.com/boschresearch/bt_tools

agjohnson commented 2 months ago

I'm going to put this on next sprint so we don't forget about it. It's not a common bug, however the failure case isn't great as it is a silent failure to the user.

humitos commented 2 months ago

This is not a silent failure to the user. We are failing the builds if that happens:

https://github.com/readthedocs/readthedocs.org/blob/6aa2757426d2838557bae2c117780f39d6116d24/readthedocs/projects/tasks/builds.py#L945-L961

They should see "Error uploading files to the storage." error in the build details page.

agjohnson commented 2 months ago

Seems like that is just our logging though, not reporting/notification to the user.

From the attached conversation, this is the build the user reported:

https://app.readthedocs.org/projects/bt-tools/builds/25303128/

There's no failure mentioned there besides the generic failure:

image

At least there is a notification, so not silent technically, but there is no indication of what actually failed there.

humitos commented 2 months ago

Yeah, BuildApp exception always shows the generic one by design. We don't want to expose the user with cryptic rclone related or similar issues. They are internal issues.

agjohnson commented 2 months ago

We should mention something to the user though, not just a generic failure. It indeed does not need to mention rclone, but does need to specifically call out what the user is noticing. I would not mention "Error uploading ..." as uploading is a background technical implementation, without the user knowing what is happening in this step.

Instead, what we should communicate to the user is that their documentation was not updated, or might have only been partially updated, and to resolve the issue they can try rebuilding again.

In this particular case, retrying will not fix the problem though.

humitos commented 2 months ago

we should communicate to the user is that their documentation was not updated, or might have only been partially updated, and to resolve the issue they can try rebuilding again

Good point 👍

humitos commented 2 months ago

We will need to create a specific exception with message_id and notification for this. Then, handle it at https://github.com/readthedocs/readthedocs.org/blob/6aa2757426d2838557bae2c117780f39d6116d24/readthedocs/projects/tasks/builds.py#L480-L483 to check if the exception has a message_id or not.