misskey-dev / misskey

๐ŸŒŽ An interplanetary microblogging platform ๐Ÿš€
https://misskey-hub.net/
GNU Affero General Public License v3.0
9.85k stars 1.33k forks source link

moving old files on local server to cloud storage #9069

Open ooozerooo opened 2 years ago

ooozerooo commented 2 years ago

I have set up object storage successfully and now newly uploaded files will show up in my S3 bucket, but I'm wondering whether there is a way that you could move your local files (everything under misskey/files) to cloud storage and delete those original copies on your local server?

I have synced all files to S3 using aws s3 sync files s3://bucket-name/ --endpoint-url=https://bucket.endpoint --acl public-read, but dont know how to change urls for the old posts and stuff to the new cloud storage address - all contents uploaded before using cloud storage are still pointing to the original local file address, so if i delete things under the misskey/files directory i could see nothing for the attachments for all old posts.

Is there anyway to solve this? Any answer is appreciated.

Thank u.

ghost commented 1 year ago

Similar problem here with moving from one object storage to another. How can the URL be changed for old media?

skid9000 commented 1 year ago

Same problem here.

ghost commented 1 year ago

I guess you have to use a nginx rewrite and still use your old domain

skid9000 commented 1 year ago

yeah that's kind of a bummer.

I saw that full url where stored in the drive_file table, i'll see if i can script the change of all urls in one go. (have 27k records in db, wml)

20221120_11-56-04

ghost commented 1 year ago

if you have that sql statement ready for production ^^ .. please share it

skid9000 commented 1 year ago

Could be improved, but at least it works. Be sure to do a db backup before hand just in case something goes wrong.

NOTE : I synced my files in <bucket>/files/, if you didn't, you should adapt the script.

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "url" FROM drive_file
             WHERE "url" LIKE 'https://<misskey fqdn>/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "url"=' || quote_literal('https://<s3 fqdn>/<bucket>/files/' || trim(leading from r.url, 'https://<misskey fqdn>/files/')) || ' WHERE "url"=' || quote_literal(r.url);
    END LOOP;
END$$;

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "thumbnailUrl" FROM drive_file
             WHERE "thumbnailUrl" LIKE 'https://<misskey fqdn>/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "thumbnailUrl"=' || quote_literal('https://<s3 fqdn>/<bucket>/files/' || trim(leading from r."thumbnailUrl", 'https://<misskey fqdn>/files/')) || ' WHERE "thumbnailUrl"=' || quote_literal(r."thumbnailUrl");
    END LOOP;
END$$;

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "webpublicUrl" FROM drive_file
             WHERE "webpublicUrl" LIKE 'https://<misskey fqdn>/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "webpublicUrl"=' || quote_literal('https://<s3 fqdn>/<bucket>/files/' || trim(leading from r."webpublicUrl", 'https://<misskey fqdn>/files/')) || ' WHERE "webpublicUrl"=' || quote_literal(r."webpublicUrl");
    END LOOP;
END$$;
ooozerooo commented 1 year ago

Could be improved, but at least it works. Be sure to do a db backup before hand just in case something goes wrong.

Thanks for the codes! They look nice to me but I'm not sure why this doesn't work for me.

It somehow drops all the 'th' in 'thumbnail' in my case. The urls should be changed to 'https://s3.example.tld/files/thumbnail-xxxxxxxxxxxxx', but it was actually changed to 'https://s3.example.tld/files/umbnail-xxxxxxxxxxxxx'. I have no idea why the "th" was dropped as well. It only accurately trims the urls when I synced my files to <bucket>/files/ and uses EXECUTE 'UPDATE drive_file SET "thumbnailUrl"=' || quote_literal('https://s3.example.tld/' || trim(leading from r."thumbnailUrl", 'https://example.tld/')) || ' WHERE "thumbnailUrl"=' || quote_literal(r."thumbnailUrl"); (put nothing after the domain name, otherwise the 'th' will be missing).

Moreover, it seems like this would cause the original files being inaccessible directly from the urls. Though everything looks good on the instance itself, if you click 'copy url' or 'download' in your drive, you wouldn't be able to see the original files directly. This is because the filename extension is missing. For instance, if you visit https://never.bid/files/2774cde9-3aec-490a-af70-1dbd74deda8e (on my server), you would be able to see the picture, but if you click https://files.never.bid/files/2774cde9-3aec-490a-af70-1dbd74deda8e (on S3), you can't really see it. That is becasue it's a .jpg file, and somehow when you sync your files to object storage and update the urls in your database without extensions like .jpg you cant see anything (you can download it and change the extension to .jpg and you will see the picture is definitely there).

Also, it seems like these codes won't update urls for emojis used in notes. In my case i can see urls for the attachments in notes and files in my drive have been updated, but the urls for emojis inserted in notes dont seem to change at all. I'm not sure whether they should be updated somewhere else (perhaps in emoji.originalUrl?).

Here are the codes I used for my instance (im so confused why this drops the 'th' as well). Thanks for the help.

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "url" FROM drive_file
             WHERE "url" LIKE 'https://never.bid/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "url"=' || quote_literal('https://files.never.bid/files/' || trim(leading from r.url, 'https://never.bid/files/')) || ' WHERE "url"=' || quote_literal(r.url);
    END LOOP;
END$$;

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "thumbnailUrl" FROM drive_file
             WHERE "thumbnailUrl" LIKE 'https://never.bid/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "thumbnailUrl"=' || quote_literal('https://files.never.bid/files/' || trim(leading from r."thumbnailUrl", 'https://never.bid/files/')) || ' WHERE "thumbnailUrl"=' || quote_literal(r."thumbnailUrl");
    END LOOP;
END$$;

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT "webpublicUrl" FROM drive_file
             WHERE "webpublicUrl" LIKE 'https://never.bid/files/%'
    LOOP
        EXECUTE 'UPDATE drive_file SET "webpublicUrl"=' || quote_literal('https://files.never.bid/files/' || trim(leading from r."webpublicUrl", 'https://never.bid/files/')) || ' WHERE "webpublicUrl"=' || quote_literal(r."webpublicUrl");
    END LOOP;
END$$;

It gives me something like 'https://files.never.bid/files/umbnail-xxxxxxxxxxxxxx', cant figure out why

KisaragiEffective commented 3 months ago

triage: ้–ข้€ฃissueใŒใ‚ใฃใŸใ‚ˆใ†ใชๆฐ—ใŒใ™ใ‚‹ใฎใงๅคใ„ๆ–นใซๅฏ„ใ›ใŸใ„

Sayamame-beans commented 3 months ago

maybe related: #10783