salsadigitalauorg / merlin-framework

Merlin - migration framework
GNU General Public License v3.0
16 stars 3 forks source link

Final migration comparison report #109

Closed stooit closed 4 years ago

stooit commented 4 years ago

Description Once a migration has been completed it is not always immediately clear how successful a migration has been without gathering logs from various places and performing analysis on the results.

It would be useful to have a scripted solution that gives more confidence a migration has captured all relevant pages and assets.

Proposed solution Support scripted solution (and a new Symfony command) to compare expected source URLs from a Merlin config to destination URLs on migrated site. This can be achieved by reusing the outputs from the original Merlin run (e.g the URL list generated by the original crawl)

Content level comparison is not expected at this stage, but an indication that content exists on the expected URLs (e.g 200 responses) would be a good initial indication.

Similarly a 200 response for embedded assets (pdfs/documents) on the resulting site should also be tested.

This will depend on:

When Merlin runs it can create a prepopulated validate.yml file containing all URLs and a placeholder for the destination domain to test against. Then a user needs only update the destination testing domain and run with php migrate validate -c validate.yml or similar.

Additional context None

AlexSkrypnyk commented 4 years ago

@srowlands it is not clear from the description if the comparison will happen by re-scraping already migrated content (long) or simply assessing logs output (produced by the existing logging mechanism). Could you please clarify.

Separately to this, can the report be saved somewhere in the custom destination so that it could be stored within CI artifacts.

stooit commented 4 years ago

@AlexSkrypnyk good point, added some additional detail - the intent here is absolutely to use the outputs already generated (or with minimal enhancement) rather than any requirement to re-crawl.

We have a standard -o flag to provide path to output, which should be fine for CI artifacts

derklempner commented 4 years ago

This reporting exists now, see Reporting.md in the docs for more info.