Closed GoogleCodeExporter closed 9 years ago
I just checked my implementation - it does not include any crawls where the
site was not included. This means that there are no "gaps" in any of the charts
- not sure whether this should be corrected.
For an example see http://bayou.clark-consulting.eu/site/104183/
If I do it would be with a lookup table of labels (because crawls are not
publically available) as dates and an outer join. That should have a minimal
effect on performance. "label" could then be normalised out of pages.
Original comment by charlie....@clark-consulting.eu
on 4 Mar 2013 at 7:33
I've now added this to my implementation, eg.
http://www.mamasnewbag.org/site/15970/ - only two crawls of this site but
charts for the whole set.
The easiest way to do this is to add a view with the dates:
CREATE VIEW date_range AS SELECT DISTINCT label FROM pages
-- ORDER BY label DESC # when label is type DATE
;
For the select list you can simply restrict the query for the site in question
SELECT label FROM pages
WHERE urlShort = ?
For the results you can use an outer join on this and this removes the need to
pad the result sets for charts.
SELECT date_range.label, pages.* FROM date_range
LEFT JOIN pages ON
(pages.label = date_range_label
AND
pages.urlShort = ?)
Sorting is still best accomplished by using the date type for label.
Original comment by charlie....@clark-consulting.eu
on 21 Mar 2013 at 11:26
Add urlhash to pages table and created an index on it, so now can search based
on urlhash. Must faster & more accureate.
Original comment by stevesou...@gmail.com
on 21 Jul 2013 at 7:58
Original issue reported on code.google.com by
stevesou...@gmail.com
on 27 Jan 2013 at 11:46