Open sminnee opened 11 years ago
This is partially completed (there is a build task) but I think it's buggy.
It sounds like @phptek is going to to finish this off.
Most of the code is in StaticSiteRewriteLinksTask
, and if my assessment is correct, the main source of bugs here is that it's not clear which page you should be linking to when the import script has been imported multiple times.
StaticSiteDataExtension
defines a StaticSiteURL
field, and I think that, in order to provide a robust tool, the StaticSiteURL
should be unique across a single StaticSiteContentSource
(StaticSiteContentSource
is also a has_one created by StaticSiteDataExtension
)
So, before importing a page, run something like this: (it would need to be abstracted out for different content types, the query would be created using the ORM, etc, but you get the idea)
UPDATE SiteTree SET StaticSiteURL = NULL WHERE StaticSiteURL = 'this-url-im-about-to-add' AND StaticSiteContentSourceID = CurrentlyImportedContentSourceID
The other issue I ran into is that there were a lot of URLs that couldn't be rewritten, and I'm not sure if this is because of trivial differences (like case, or escaping of characters) that should be detected. I would have "link couldn't be rewritten" warnings aggregated into a single list so we can dig into what's going on with them.
OK, so to summarise from what @sminnee has said above, as a base to work from here's what I'll do:
SiteTree
specific)
File
and Image
objects (see #1)StaticSiteUtils
class along with a resetStaticSiteURLs()
method (as suggested by @sminnee )resetStaticSiteURLs()
has been added to StaticSiteFileTransformer
and StaticSitePageTransformer
but are commented at this time as I haven't had time to really check whether it is neededlink-rewrite
branch as to why so many import-failures are occurring which has obvious side-effects here, and causes link-rewrite failures. There seem to be issues with the specific MOSS-CMS we're scraping with multiple URLs with spaces and urlencoded spaces, being treated differently; in that some throw a 400 error and then redirect to a canonical URL and others don't.Further investigation and work should occur next week.
A massive number of changes, refactoring, bugfixing and tests have been added to my fork (https://github.com/phptek/silverstripe-staticsiteconnector).
The link-rewrite is much more effective as each import can now be identified by an ID, the ID can therefore be passed to the task so it "knows" which duplicate to modify.
Update: this can now be optionally run automatically via the UI after an import. I have also added a DatObject driven CMS report, derived from data gathered during this task, which shows in detail, which links failed to be re-written, breaks them down by type and provides a per-imported page count of each.
Internal links within the site should be rewritten to point to the imported SilverStripe pages.