owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.42k stars 183 forks source link

Possible race condition while creating/getting thumbnails #10684

Open jvillafanez opened 3 days ago

jvillafanez commented 3 days ago

Describe the bug

The issue might happen when we're getting a thumbnails from the storage at the same time the thumbnails is being generated. Target code is around https://github.com/owncloud/ocis/blob/master/services/thumbnails/pkg/thumbnail/storage/filesystem.go#L44-L77

The sequence of actions is more or less the following:

  1. Put the thumbnail in the FS storage. This will create a 0 byte file (successful os.Create call at https://github.com/owncloud/ocis/blob/master/services/thumbnails/pkg/thumbnail/storage/filesystem.go#L65)
  2. At that time, the thumbnail for that specific file is requested. The whole Get method is executed (https://github.com/owncloud/ocis/blob/master/services/thumbnails/pkg/thumbnail/storage/filesystem.go#L44) so we return the 0 byte file that was created in the previous step.
  3. The Put method finishes later, writing the thumbnail in the file.

There might be other variations because we might not guarantee that the file.Write method is atomic, so there could be multiple write operations in order to save the thumbnail. This could mean that partial data could be retrieved by the Get method.

Steps to reproduce

No easy way to reproduce the issue. The race condition happens when getting the thumbnails at the same time it's being created. 1. 5. 6.

Expected behavior

Either we get an error because the thumbnail is missing (or not created yet), or we get the whole thumbnail.

Actual behavior

A race condition might happen, so we could send 0 bytes or partial data.

This might happen only while the thumbnail is created, or more exactly, when the data is being saved in the FS. The thumbnails are expected to have small size, so the time window for the race condition to happen is expected to be very small.

Setup

Please describe how you started the server and provide a list of relevant environment variables or configuration files.

```console OCIS_XXX=somevalue OCIS_YYY=somevalue PROXY_XXX=somevalue ```

Additional context

We have some options to tackle this issue:

rhafer commented 3 days ago

Use read/write mutex. The Get method would block until the Put method has finished.

  • Note that we need to use https://pkg.go.dev/sync#RWMutex to prevent Gets from locking each other.
  • We might also want to use locks based on the target key / file instead of a global lock.

This might not work. There might be multiple thumbnails services running all using the same shared storage for the generated thumbnails.

rhafer commented 3 days ago

This should work without any locking:

In the worst case (multiple clienst requesting the same thumbnail before the first completed) we would generate the thumbnail multiple times. But we would never return partial or empty content.

jvillafanez commented 3 days ago

This might not work. There might be multiple thumbnails services running all using the same shared storage for the generated thumbnails.

Yes, you're right. I usually forget about the replicas. Maybe we should consider to use distributed locks at some point.

This should work without any locking:

Write the complete thumbnail to a tempfile then rename it to the final name. (At least for Posix compliant filesystems rename is atomic)

In the worst case (multiple clients requesting the same thumbnail before the first completed) we would generate the thumbnail multiple times. But we would never return partial or empty content.

Looks good to me :+1: