wincent / masochist

⛓ Website infrastructure for over-engineers
MIT License
78 stars 26 forks source link

Import very old content from www.wincent.com #82

Closed wincent closed 7 years ago

wincent commented 7 years ago

May be able to write some hacky script to get the HTML out of articles like this one. A lot of that old content is garbage but it does have some historical interest. I have articles spanning from around 2005 to 2008. (Actually, just found one as old as 2004.)

Possibly use Pandoc or something to convert to Markdown.

Will need import script that can put these on a branch somewhere, then rewrite the content branch to rebase the new content on top of the old content while preserving all the dates correctly.

wincent commented 7 years ago

Copying in some older notes I have:

Shut down dat PHP stuff.

The svn/git log stuff could become snippets if I wanted, but I think the main thing of interest is the blog.

Unfortunately, would need some kind of markup converter. I am not even sure what language the blog source is in. It might be easiest to go from the HTML output back to wikitext...

Not sure if I still have this in a DB dump somewhere, or if I have to scrape the HTML.

Eventually want to shut down the kbase subdomain as well: content is still there at: http://kbase.wincent.com/old/knowledge-base/Main_Page.html [dead link]

Also cool to import: I have some very old PHP files archived under ~/web/archive

See also the task I have to make a wikitext to markdown converter: I may end up using Pandoc for both.

wincent commented 7 years ago

Many URLs are obviously going to break. For example a blog post like:

https://www.wincent.com/a/about/wincent/weblog/archives/2008/02/ragel_wins_fata.php

Will get moved to a new home at a URL like:

https://wincent.com/blog/ragel-wins-fatality

The old page should become a 301 (permanent) redirect.