humanmade / WordPress-Importer

In-development rewrite of the WordPress (WXR) Importer
Other
358 stars 63 forks source link

Reusing existing uploads #92

Open smerriman opened 7 years ago

smerriman commented 7 years ago

Have just been investigating how this plugin works given the numerous issues with the core importer.

One of my biggest wishes is a better way to migrate uploads. With the current importer - and from what I can tell, this importer as well - every single file has to be manually downloaded with HTTP through the old website, and reuploaded on the new.

Not only is this slow for sites with thousands of uploads, but it also results in loss of all thumbnails. Thumbnails are recreated on the new site at new dimensions only - every thumbnail that was created earlier is lost, which is bad when post content contains numerous references to them. You can't even assume all sizes match the settings in the currently active theme, as that may have changed many times over the years.

When setting up a development site to start building a new theme, I zip up and copy over the wp-content/uploads folder with SSH to ensure you'll never get a 404 for an old image URL. Would it be possible to adjust this importer to first check if the file already exists - in which case, add it from the server rather than as a sideload? This would speed things up immensely.

kasparsd commented 7 years ago

This is an important issue. I've moved a site with 60,000 image originals and it could never have worked without downloading and moving those 70GB of images using other methods.

Here is a PHP script to extract the attachment URLs from the WXR export files: https://gist.github.com/kasparsd/06c8aa04df48bceceb468d19d80b724a

Note that it doesn't include the URLs of the resized images (the script was created for WP VIP and run locally).

The WordPress Importer v2 does support disabling attachment import but that also disables creating the attachment posts which are required if only the actual files are moved using other methods.

Here is a prototype of the fetch_remote_file() method that checks for a local file before doing a remote fetch. Since the directory and file structure stays the same, there is no need to update the attachment file path in the attachment post. We do need to update the image URLs in the post_content but that's already another issue.

rmccue commented 7 years ago

One thing I want to do (#3) is enable other methods of importing attachments. Often, you'll have the files already, and want to dump them into place; or, you may want to hand off the actual image downloading/transfer to a specialised tool (such as a parallel HTTP tool).

If we can add a CLI flag to completely skip downloading the file, that'd be great. Something like --attachment-download-method=skip (default as download). Potentially we could have skip-on-existing mode as well, but not sure it's hugely useful; this tends to be an either-or situation.

kasparsd commented 7 years ago

@rmccue I thought I checked all open issues related to this but apparently missed #3 which sounds great.

So if we just talk about skipping the import if the file exists #101 -- could we add support for --attachment-download-ignore-local which defaults to false? It feels like --attachment-download-method could be used in addition to that.

kasparsd commented 7 years ago

The important question is -- why should we re-download any attachment file if it already exists in the filesystem?

rmccue commented 7 years ago

why should we re-download any attachment file if it already exists in the filesystem?

Potentially the file may have been updated on the site we're importing from. It's rare, but it certainly can happen: upload a file, turns out there was a problem with it, delete the attachment (and hence file), reupload corrected file with the same filename.

could we add support for --attachment-download-ignore-local which defaults to false?

I'd call it --attachment-download-skip-existing with false as the default, but otherwise 👍

drzraf commented 6 years ago

see #145, #151 and #152