silverstripe / silverstripe-s3

Silverstripe module to store assets in S3 rather than on the local filesystem (SS4/SS5 only)
BSD 3-Clause "New" or "Revised" License
20 stars 25 forks source link

Migrating existing assets to S3 #30

Closed thepearson closed 4 years ago

thepearson commented 4 years ago

We're using this module successfully with a bunch of new sites, We would like to use it with existing sites, however not sure how to migrate assets to S3 from default storage. Copying the assets into a bucket doesn't seem to work. Any advice or thoughts?

obj63mc commented 4 years ago

As far as I know if you have your assets in the existing public/assets folder you should just have to upload the .protected folder to your protected portion of your bucket (see protected prefix) and then all other assets to the public portion of your bucket (public prefix). At that point if you install the module configure it with your credentials and run a ?flush=1 it should automatically then map from that public local directory to urls on your s3 bucket. Now when copying these files to your bucket, you will need to make sure the public files do have read access rights. An example might be (not tested but something to start with)

aws s3 cp public/assets/Uploads s3://{bucket}/{public-prefix}/Uploads --recursive --acl public-read

Now note this is starting from a SS4.x site directly. If you are migrating from SS3 to SS4 not sure the best way to go about this.

Also to help troubleshoot when you say doesn't work what types of errors are you getting?

obj63mc commented 4 years ago

Some other example commands that you can try when copying to the bucket -

example .env file

AWS_PUBLIC_BUCKET_PREFIX="public"
AWS_PROTECTED_BUCKET_PREFIX="protected"

aws sync commands -

aws sync public/assets/.protected s3://{bucket}/protected
aws sync public/assets s3://{bucket}/public --exclude "*.protected/*" --acl public-read
thepearson commented 4 years ago

Thanks, I've boiled it down to the following. I've migrated the assets as described above, I can upload and view new images/assets. These get placed in S3 and are accessible via the S3 URLs.

Existing assets migrated are trying to be served from the server filesystem /public/assets* and not the S3 bucket.

So for existing assets (that have been copied to S3) the S3 PublicAdapter is not being used. I'm going to keep debugging, but if that sheds any light on it let me know.

obj63mc commented 4 years ago

Still not sure what would cause this as in the database, they simply store /Uploads/... As the file name and if using this adapter it just maps the s3 url as the prefix to what is in the database. On your server or dev machine there will be a folder in the /tmp folder (assuming Linux/Mac) for your sites cache. Can you delete that folder and then run a ?flush=1 on your site and let me know if it is still trying to map to your local server.

obj63mc commented 4 years ago

so did a fresh blank install and did the following -

  1. clean ss4 site, completely empty
  2. logged into the cms and uploaded one file and left in draft, uploaded another file and published. I can see both assets in the cms and when I click the preview link from the asset manager it shows the local url
  3. Ran the commands from my example to sync

    aws sync public/assets/.protected s3://{bucket}/protected
    aws sync public/assets s3://{bucket}/public --exclude "*.protected/*" --acl public-read
  4. installed the s3 module via composer -

    composer require --save silverstripe/s3
  5. added in my env variables

    AWS_REGION="us-east-1"
    AWS_BUCKET_NAME="X"
    AWS_ACCESS_KEY_ID="X"
    AWS_SECRET_ACCESS_KEY="X"
    AWS_PUBLIC_BUCKET_PREFIX="X"
    AWS_PROTECTED_BUCKET_PREFIX="X"
  6. Added the following to my yml -

    ---
    Only:
      envvarset: AWS_BUCKET_NAME
    After:
      - '#assetsflysystem'
    ---
    SilverStripe\Core\Injector\Injector:
      Aws\S3\S3Client:
        constructor:
          configuration:
            region: '`AWS_REGION`'
            version: latest
            credentials:
              key: '`AWS_ACCESS_KEY_ID`'
              secret: '`AWS_SECRET_ACCESS_KEY`'
  7. ran a http://sitename/dev/build
  8. ran a http://sitename/?flush=1
  9. went back into the admin and went to the asset manager and clicked on the preview - it is now pulling and showing the file from my s3 location.

So I am not able to replicate this. I'd make sure that your image references are not hard coded in anyways and would definitely make sure it isn't some sort of caching issue (partial cache, app cache, css referencing local file, etc).

thepearson commented 4 years ago

Yeah, thanks. Just confirming I'm pretty sure this is related specifically to my project and not this module. There's some funky legacy code interfering with the asset links.

obj63mc commented 4 years ago

Were these linked assets to a dataobject with like a $has_one relationship or images inserted with the wysiwyg in html content areas. That may help track down what the issue is as typically images inserted with the wysiwyg are saved into the db with a shortcode referencing the file object so those should map directly. Now if a user edited the html and put a hard coded image tag into it, then those would have to be updated manually in the wysiwyg or database. You could do something like a find and replace mysql query -

UPDATE  `SiteTree` SET  `Content` = REPLACE(`Content`, '/asset/Uploads/', '{s3url}/asset/Uploads/') WHERE  `Content` LIKE '%/asset/Uploads/%';

Also if you can stand up a full backup after migration you could try running the Sync Assets just to make sure the DB is updated (note this can be dangerous if working on a live site as if the file doesn't exist anymore it will delete the reference in the DB which could break referenced dataobjects etc...)