superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.82k stars 331 forks source link

[feature] Add CDN acceleration domain for S3 storage #2549

Closed aaro-n closed 9 months ago

aaro-n commented 9 months ago

Is your feature request related to a problem ?

Refer to the issue #2155

I have stored GoToSocial's images on Alibaba Cloud OSS using the S3 protocol. I have already bound an acceleration domain on Alibaba Cloud OSS. However, I couldn't find a method in the GoToSocial documentation to set up an acceleration domain for S3 storage. Additionally, I didn't find an answer to the problem in the Alibaba Cloud documentation. I tried configuring Nginx to rewrite GoToSocial's image 302 redirection but failed. I tested accessing the OSS storage using the acceleration domain and found that I need to modify GoToSocial's 302 redirection rules.

Describe the solution you'd like.

The configuration of storing images on Alibaba Cloud OSS using the S3 protocol in GoToSocial:

  GTS_STORAGE_BACKEND = "s3"
  GTS_STORAGE_S3_ENDPOINT = "oss-ap-northeast-1.aliyuncs.com"
  GTS_STORAGE_S3_ACCESS_KEY = "111111"
  GTS_STORAGE_S3_SECRET_KEY = "222"
  GTS_STORAGE_S3_BUCKET = "gotosocial-image"
#  GTS_STORAGE_S3_PROXY = "true"

At present, GoToSocial has disabled S3 proxy access to image streams.

The process of using Alibaba Cloud OSS with GoToSocial when closing the S3 proxy (GTS_STORAGE_S3_PROXY = "false"): When a browser or app accesses the image at https://me.12.11/fileserver/01RMN5FC48PQ2J8YP43XJGSSTF/attachment/small/01XT7W8XBGG52S6ETD5ZXJBRMK.jpg, this image link will undergo a 302 Found redirect to https://gotosocial-image.oss-ap-northeast-1.aliyuncs.com/01RMN5FC48PQ2J8YP43XJGSSTF/attachment/small/01XT7W8XBGG52S6ETD5ZXJBRMK.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=LTAI5tSBDopiwWGmFqcSR6vN%2F20240120%2Foss-ap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20240120T112909Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=image%2Fjpeg&X-Amz-Signature=dc86138779847b2259a70a9d8dbf513ad16bbe97d62ae81fbb6ccee649c3a672. At this point, the browser or app will correctly display the image.

The expected use of GoToSocial S3 involves storing data on Alibaba Cloud OSS.

  1. GoToSocial needs to add a environment variable for setting the S3 acceleration domain, for example, GTS_STORAGE_S3_CDN = "cf-gotosocial.111111.xyz".

  2. Currently, when accessing the image https://me.12.11/fileserver/01RMN5FC48PQ2J8YP43XJGSSTF/attachment/small/01XT7W8XBGG52S6ETD5ZXJBRMK.jpg, a 302 Found redirect occurs to https://gotosocial-image.oss-ap-northeast-1.aliyuncs.com/01RMN5FC48PQ2J8YP43XJGSSTF/attachment/small/01XT7W8XBGG52S6ETD5ZXJBRMK.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=LTAI5tSBDopiwWGmFqcSR6vN%2F20240120%2Foss-ap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20240120T112909Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=image%2Fjpeg&X-Amz-Signature=dc86138779847b2259a70a9d8dbf513ad16bbe97d62ae81fbb6ccee649c3a672. This should be changed to a 302 Found redirect to https://cf-gotosocial.111111.xyz/01RMN5FC48PQ2J8YP43XJGSSTF/attachment/small/01XT7W8XBGG52S6ETD5ZXJBRMK.jpg, allowing the use of CDN to accelerate the image.

Describe alternatives you've considered.

Alibaba Cloud OSS can be configured to allow direct access to OSS storage from specific IPs when the storage bucket is set to private, without the need for access verification. Therefore, GoToSocial needs to make the following changes:

  1. Disable the GoToSocial S3 proxy and set up CDN acceleration domain.
  2. Modify GoToSocial's 302 redirection by replacing the domain in the 302 redirection with the CDN domain. Importantly, remove all content after the image format in the redirection (e.g., remove ?X-Amz-Algorithm=……649c3a672).

Additional context.

1.Configure an IP blacklist or whitelist 2.OSS+CloudFlare CDN 免费加速

When adding a domain to OSS according to reference 2, the image can be accessed using the following URLs:

Both URLs can be used to access the image.

aaro-n commented 9 months ago

The document is about binding domains and accessing URLs. Here's the translation to en-US:

Reference Document 2: Binding Domain, URL Access Table Link Allow Access
https://cf-gotosocial.111111.xyz/01H7Y0CN8998G6784W57BX0BCC/attachment/original/01HM6RNHSJXH9RHVA0GK5GEAFP.jpg?Expires=1705813161&OSSAccessKeyId=TMP.3KeESS3qFNGoSYvopKkF9nmsnkex8cFn8yeSgMVV42s3NLDherFn8Zdc2VVskpSYFoSjG2whNGDEekdEF9EcCsRUD9jCmC&Signature=hVv5xMsUzKprgsCipHEHIdDCFso%3D
https://cf-gotosocial.111111.xyz/01H7Y0CN8998G6784W57BX0BCC/attachment/original/01HM6RNHSJXH9RHVA0GK5GEAFP.jpg
https://gotosocial-image.oss-ap-northeast-1.aliyuncs.com/01H7Y0CN8998G6784W57BX0BCC/attachment/original/01HM6RNHSJXH9RHVA0GK5GEAFP.jpg?Expires=1705813161&OSSAccessKeyId=TMP.3KeESS3qFNGoSYvopKkF9nmsnkex8cFn8yeSgMVV42s3NLDherFn8Zdc2VVskpSYFoSjG2whNGDEekdEF9EcCsRUD9jCmC&Signature=hVv5xMsUzKprgsCipHEHIdDCFso%3D
https://gotosocial-image.oss-ap-northeast-1.aliyuncs.com/01H7Y0CN8998G6784W57BX0BCC/attachment/original/01HM6RNHSJXH9RHVA0GK5GEAFP.jpg
aaro-n commented 9 months ago

I always thought that if GoToSocial uses AWS S3 as the backend storage, GoToSocial would somehow obtain the CDN domain returned by AWS S3 and replace the domain of the image URL link with the domain returned by AWS S3. If using Alibaba Cloud OSS because it is compatible with the AWS S3 protocol, it will not return an accelerated domain to GoToSocial. Today, I created a GoToSocial test site and tested it with AWS S3 as the storage backend, and it turns out it's not like that. Of course, it's also possible that I haven't found the configuration document I need.

Configuration of GoToSocial using AWS S3 as storage

The storage bucket name created in AWS S3 cannot be in domain form, for example, ceshi.example.org, otherwise, GoToSocial image 302 links will be different from the following content. Refer to the GoToSocial documentation for storage bucket creation.

GTS_STORAGE_BACKEND: s3
GTS_STORAGE_S3_ENDPOINT: s3-us-west-2.amazonaws.com
GTS_STORAGE_S3_ACCESS_KEY: 1111
GTS_STORAGE_S3_SECRET_KEY: 222
GTS_STORAGE_S3_BUCKET: gotosocial-ceshi

Testing

Link Allowed Access
https://gotosocial-ceshi.s3.dualstack.us-west-2.amazonaws.com/01P31QWZ2HSGBNMSR79V0CPRBD/attachment/original/01HMNVDB9Y9Q1QQDG20PX34GMN.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIATWSGLRJBVI5PWDQD%2F20240121%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20240121T111010Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=image%2Fjpeg&X-Amz-Signature=c3d68cd0a8b8d459f0114a0b5944307f121daee593ace03470b438acf4f2fed1
https://gotosocial-ceshi.s3.dualstack.us-west-2.amazonaws.com/01P31QWZ2HSGBNMSR79V0CPRBD/attachment/original/01HMNVDB9Y9Q1QQDG20PX34GMN.jpg

For AWS S3, add AWS CloudFront as a front proxy and modify S3 bucket permissions according to AWS CloudFront. AWS CloudFront domain is: ceshi.xxxx.xxxx.org

Link Allowed Access
https://ceshi.xxxx.xxxx.org/01P31QWZ2HSGBNMSR79V0CPRBD/attachment/original/01HMNVDB9Y9Q1QQDG20PX34GMN.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIATWSGLRJBVI5PWDQD%2F20240121%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20240121T111010Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=image%2Fjpeg&X-Amz-Signature=c3d68cd0a8b8d459f0114a0b5944307f121daee593ace03470b438acf4f2fed1
https://ceshi.xxxx.xxxx.org/01P31QWZ2HSGBNMSR79V0CPRBD/attachment/original/01HMNVDB9Y9Q1QQDG20PX34GMN.jpg

Conclusion

Even if GoToSocial uses AWS S3 as storage, image 302 redirects will not use the domain in AWS CloudFront.

Other

What if AWS S3 uses a domain as the storage bucket name?

GoToSocial Configuration

GTS_STORAGE_BACKEND: s3
GTS_STORAGE_S3_ENDPOINT: s3-us-west-2.amazonaws.com
GTS_STORAGE_S3_ACCESS_KEY: 1111
GTS_STORAGE_S3_SECRET_KEY: 222
GTS_STORAGE_S3_BUCKET: ceshi.xxxx.xxxx.org

Corresponding links become https://s3.dualstack.us-west-2.amazonaws.com/ceshi.xxxx.xxxx.org/01P31QWZ2HSGBNMSR79V0CPRBD/attachment/original/01HMNWY186AAAFMZ0291Q3B7GD.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIATWSGLRJBVI5PWDQD%2F20240121%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20240121T113645Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-type=image%2Fpng&X-Amz-Signature=1cc9f534e86493baad32da8a5a5289c398e629c3553b15994338aae59b7e91b0

Summary

Currently, when configuring GoToSocial to use S3 as image storage, there are 2 ways:

  1. Enable GoToSocial S3 proxy, which puts pressure on GoToSocial's running server, although you can configure Nginx to cache images.
  2. Disable S3 proxy, and the image URL is directly the S3 storage bucket link. If the S3 storage service is not self-built but a cloud service provider, it may result in excessive traffic and computation fees due to image link requests.
daenney commented 9 months ago

The S3 object store has no way of knowing that your content can be served and accessed through some other domain entirely, so GtS has no way of knowing about it either. I imagine this problem would also hold for anyone using an AWS Cloudfront distribution for example, since you'd want to return a different URL than that of the S3 bucket in the redirect.

I think all we need here is to add something like s3-public-domain or something along those lines, and if that configuration value is set use that as the domain when constructing the URL to redirect to.

daenney commented 9 months ago

I'm going to close this in favour of #2574 where we've got the generic solution and different use cases documented. If you're still interested in this feature, feel free to subscribe to that issue.