scripting / Scripting-News

I'm starting to use GitHub for work on my blog. Why not? It's got good communication and collaboration tools. Why not hook it up to a blog?
119 stars 10 forks source link

How easy to switch blog hosting services? #86

Open scripting opened 6 years ago

scripting commented 6 years ago

I posted two questions on Twitter, summarized them on my blog, and thought it would be a good idea to post them here as well, to provide a place for a more detailed examination.

Questions about wordpress.com and tumblr.com --

  1. Can you download your entire site in an open format (XML or JSON)?

  2. Can you redirect your site to a new service if you decide to move?

I want to be able to recommend hosting services.

manton commented 6 years ago

I have some experience now with hosting blogs on Micro.blog and helping people move their sites to and from Micro.blog. You can export from WordPress.com (their XML format is based on RSS!) and import it into your own domain name on Micro.blog, for example. We also automatically download any referenced photos (since they aren't in the XML file) and redirect old URLs so links don't break.

Tumblr has their own export format which I've been meaning to add support for. I also proposed on my blog that we need a new more universal export format.

facej commented 6 years ago

Wordpress can be moved to a different Wordpress host in a straight-forward manner - IF - the original Wordpress site is accessible. The built-in export function would appear to be complete if you choose to export "all content". Not the case. You need to export "Media" in a separate pass. That gets you an XML file that can be used to import all of the media into a different Wordpress site.

So, export "media", export "all content". On new Wordpress I would proceed with an import "media" followed by import "content".

facej commented 6 years ago

It's kind of amusing to see an XML file with CDATA that contains JSON

scripting commented 6 years ago

@facej -- I think it's cool. Shows a standard that lives past what some would think was its expiration date.

@manton -- good point about downloading the images. I've seen WP criticized for not including the images. How hard is it to find them? Do you have to parse the HTML text?

facej commented 6 years ago

I think its fantastic ;-) The "media" XML file contains all the links to the images as well as the WP metadata which is the JSON part.

scripting commented 6 years ago

@facej -- then the criticism is unwarranted. if the links are easy to access in the XML then what else could anyone want. Do you have an example of a small exported site in a zip file that could serve as a demo?

facej commented 6 years ago

My example isn't actually small. I just did the "all content" export and I discover that the "media" info is actually in the all content version. All of the media have links like this

<wp:attachment_url><![CDATA[http://www.cgne-tucson.org/wp-content/uploads/2017/01/Town-Crier-2017-01.pdf]]></wp:attachment_url>

Accessing things in XML can be quite the challenge, but yeah, an XSLT transform would work, as would a fairly simple sed | awk | curl sequence

facej commented 6 years ago

WP-export.zip

Media-only export. All-content export.

manton commented 6 years ago

@scripting As @facej says the image information is in the XML file, but I actually parse the HTML for each post because there might be posts with images that didn't use WordPress's upload feature, and I might need to update the HTML anyway. Also if Micro.blog gets a 404 when downloading the image (because the site is no longer online), it checks the Internet Archive to see if there's a copy there.

It all "works" but having an archive format that contained the actual images would be more robust, in my opinion.

bradbarrish commented 6 years ago

I will say that having a very large WP database can significantly complicate things in terms of switching hosts. Very recently I decided to move my blog, which has existed since 2001, from a self-hosted Wordpress install with Dreamhost to Wordpress.com. The main reason being I’m trying to reduce the number of things I’m having to maintain myself. I started that process over a month ago and am still working with the support team to get everything working right. We keep having to switch the DNS back and forth, especially due to images breaking on Wordpress.com. All that said, I certainly feel good about all the content being under my control, but it’s not easy in my experience.

ttepasse commented 6 years ago

Btw: the European Union's General Data Protection Regulation came into force in May. One of the legal rights therein is in Article 20, the right to data portability "in a structured, commonly used and machine-readable format".

That regulation applies, if the organization or the user is based in the EU. So while Wordpress and Tumblr/Yahoo are american companies, if they process data of EU citizens, they should protect the rights in the GDPR including data portability. If I were in the market for a hosting service, GDPR-compliance is something I'd look for.

facej commented 6 years ago

@bradbarrish Interesting. I used to maintain a lot of locally-hosted WP sites. Moved to Dreamhost a few years ago to make things "simpler". I find that the self-updating WP sites at DH, along with linking all my sites up with Wordpress.com makes everything mostly background.

Yes, the DNS thing would be an issue if moving from one place to another. That was my very first comment - it is straight-forward if the "source" blog is active while moving to a "destination" blog.

I struggled with that when I moved things to Dreamhost, but figured out the "better path" for me.

bradbarrish commented 6 years ago

@facej yeah, I’m getting the feeling I’ve made a mistake. Gonna give it one more round with the support people at Wordpress.com and if it doesn’t work, I’m just going to revert to DH and keep it all going from there. My main issue is the slowness of my site on the shared hosting plan I have so I may just have to pony up. DH is a great company. I have nothing but live for them. Support is second to none.