Closed peterwebster closed 10 years ago
This issue is partially solved, but some questions remain.
I have examined this issue for this record: http://www.webarchive.org.uk/act/node/8880.json
I have fixed the issue of Postal Address URL (point (i)) and the display of start and end dates (point (iii)).
Regarding other aspects of point(i) and point (ii), it appears that the presentation of the object in the old version of ACT does not match the JSON export object. For example:
I have therefore linked presentation of ‘field_notes’ to the field ‘value’ in database since that seems to be the correct field according to its content .
Regarding these values, should we understand that there is a mapping between values stored in the database and those displayed? In this case we need some documentation of these mappings.
The mappings are:
resource|Just this URL.
plus1|This URL plus any directly linked resources.
root|All URLs that start like this.
subdomains|All URLs that match match this host or any subdomains.
I have implemented and tested scope mapping. In order to complete this issue I would need also similar mappings for depth.
Depth:
capped|Capped (small - 500MB)
capped_large|Capped (large - 2GB)
deep|Uncapped
I have implemented all fields and mappings for target view and also similarly for instance view. I believe the issue is solved. Please let me know if there are any additional problems associated with this issue.
Thanks Roman; closing ticket.
This github ticket had two significant parts: there were some data import details that were missing, and there was the request to examine the original /act data against the new /actdev data.
As far as I understand, Roman has implemented the necessary changes for the first part (though I personally haven't checked this). Separately I've been looking into the data comparision - this has been difficult, and I will attempt to explain next.
The data held in "Andy's ACT", that is the currently live /act service, is stored in a database managed by a service layer. The database structure was defined by the service layer, not by Andy or anyone else. Plus, the ACT data is stored along with the data for the service layer (i.e., data which has nothing to do with ACT). Consequently, the original ACT data and the data imported into w3act are very difficult to compare at a database level.
If this comparison is still needed, a 'like-for-like' mapping between the original ACT data and the w3act data would be needed. As I have been unable to find such a mapping, I will leave this ticket closed. Any data discrepancies found will be added as individual field-level github tickets.
Ticket re-opened as I can do a comparison of data exported from "Andy's ACT" and /actdev. Roger is doing gathering the data exports for me; I'll then do the comparison.
Now testing content via the exports, results being noted in #31.
OK, will close this if it is now being covered at #31
As raised by Rav, there are some issues with data being successfully imported from Andy's ACT to /actdev on each deployment. Gil, you and I discussed some sort of visual check table-to-table to see how widespread the issue is, and whether it affects other fields that we haven't yet spotted. @anjackson
The details are:
(i) Relating to NPLD scope: LD criteria notes, Postal Address URL, Notes (under Via Correspondence), are not being migrated.
eg. in NPLD scope tab the Postal Address URL and the Notes (under Via Correspondence) have not been migrated from Andy’s ACT record: http://www.webarchive.org.uk/act/node/8880 to http://www.webarchive.org.uk/actdev/targets/act-8880
(ii) In Crawl Policy and Schedule tab All values for field Scope seem to be being imported as ‘Just this URL’, it should be the same as Andy’s ACT - most records there have value ‘All URLs that start like this’
The Depth setting also doesn’t look like it’s porting over;
(iii) Crawl Start Date seems to show a 10-digit number eg. Academia Rossica http://www.webarchive.org.uk/actdev/targets/act-10508/edit