sminnee / silverstripe-staticsiteconnector

Connector plugin for the SilverStripe External Content module that uses web scraping to import content.
8 stars 12 forks source link

Rewrite links #2

Open sminnee opened 11 years ago

sminnee commented 11 years ago

Internal links within the site should be rewritten to point to the imported SilverStripe pages.

sminnee commented 11 years ago

This is partially completed (there is a build task) but I think it's buggy.

sminnee commented 11 years ago

It sounds like @phptek is going to to finish this off.

Most of the code is in StaticSiteRewriteLinksTask, and if my assessment is correct, the main source of bugs here is that it's not clear which page you should be linking to when the import script has been imported multiple times.

StaticSiteDataExtension defines a StaticSiteURL field, and I think that, in order to provide a robust tool, the StaticSiteURL should be unique across a single StaticSiteContentSource (StaticSiteContentSource is also a has_one created by StaticSiteDataExtension)

So, before importing a page, run something like this: (it would need to be abstracted out for different content types, the query would be created using the ORM, etc, but you get the idea)

 UPDATE SiteTree SET StaticSiteURL = NULL WHERE StaticSiteURL = 'this-url-im-about-to-add' AND StaticSiteContentSourceID = CurrentlyImportedContentSourceID

The other issue I ran into is that there were a lot of URLs that couldn't be rewritten, and I'm not sure if this is because of trivial differences (like case, or escaping of characters) that should be detected. I would have "link couldn't be rewritten" warnings aggregated into a single list so we can dig into what's going on with them.

phptek commented 11 years ago

OK, so to summarise from what @sminnee has said above, as a base to work from here's what I'll do:

phptek commented 11 years ago

State of Play

Further investigation and work should occur next week.

phptek commented 10 years ago

A massive number of changes, refactoring, bugfixing and tests have been added to my fork (https://github.com/phptek/silverstripe-staticsiteconnector).

The link-rewrite is much more effective as each import can now be identified by an ID, the ID can therefore be passed to the task so it "knows" which duplicate to modify.

phptek commented 10 years ago

Update: this can now be optionally run automatically via the UI after an import. I have also added a DatObject driven CMS report, derived from data gathered during this task, which shows in detail, which links failed to be re-written, breaks them down by type and provides a per-imported page count of each.