rgpages / election-2014-general

2014 general election coverage for Lane County, Oregon; developed and curated by The Register-Guard.
http://vote.registerguard.com
0 stars 2 forks source link

Stories from DTI system #28

Closed mhulse closed 9 years ago

mhulse commented 9 years ago

At the top of the page, below the main header, we should have stories coming from DTI that follow this order:

Consider using AP feed content in another row or to fill-in the above with more stuff if needed.

mhulse commented 9 years ago
mhulse commented 9 years ago

I think the stories need to be on a layout. The query I use for getting cats is kinda strict about that.

With that said, I've done some work in the past on re-writing cats query:

SELECT TOP 20 
    cms.ID AS CMSStory
FROM
    dbo.Story s
INNER JOIN
    dbo.StoryPageElements spe
ON
    (
        s.storyId = spe.storyId)
INNER JOIN
    dbo.AdDbPageInfo adbpi
ON
    (
        spe.logicalPagesId = adbpi.logicalPageId)
INNER JOIN
    dbo.SubCategory sc
ON
    (
        s.subCategoryId = sc.subCategoryId)
INNER JOIN
    dbo.Status st
ON
    (
        s.StatusId = st.StatusId)
INNER JOIN
    dbo.Desk d
ON
    (
        s.deskId = d.deskId),
    dt_cms_schema.CMSStory cms
WHERE
    adbpi.runDate BETWEEN '2012-01-20' AND '2013-08-29'
AND cms.story->storyId = s.storyId
AND sc.subCategoryId IN (4812835,
                         202)
AND d.deskName <> 'trash'
AND st.statusName IN ('ready',
                      'output',
                      'web')
ORDER BY
    adbpi.runDate DESC,
    adbpi.letter ASC ;

I posted a question about the above query here:

https://groups.google.com/d/msg/dti-lightning/fB1Io90k8b0/AAnxgfxeP1IJ

Compare the code to that issue's code, merge diffs, and then modify to be more loose with matching stories.

Write new cat-getting method specifically for this project. Later, I'll consider replacing current cat-getting method with new one from this project.

mhulse commented 9 years ago

This is basically the query I use now:

SELECT DISTINCT cms.ID 
FROM dt_cms_schema.CMSStory cms, dbo.AdDbPageInfo adbpi, dbo.StoryPageElements spe, dbo.Story s, dbo.SubCategory sc, dbo.status st, dbo.Desk d 
WHERE (adbpi.runDate BETWEEN '2009-01-20' AND '2009-08-29') 
AND cms.story->storyId = s.storyId 
AND sc.subCategoryId IN (4812835, 202) 
AND s.storyId = spe.storyId 
AND spe.logicalPagesId = adbpi.logicalPageId 
AND s.subCategoryId = sc.subCategoryId 
AND s.StatusId = st.StatusId 
AND s.deskId = d.deskId 
//AND s.words > 10 
AND d.deskName <> 'trash' 
AND st.statusName IN ('ready', 'output', 'web', 'archive') 
AND NOT (s.subCategoryId in (0, 2, 51, 52, 54, 101, 200, 300, 1000, 1023, 176447, 330788, 1905902)) 
AND NOT (s.storyname LIKE '%ol.wed%' OR s.storyname LIKE '%ol.gold%') 
AND NOT (s.origin LIKE '%Wire%' OR s.origin LIKE '%NYT') 
ORDER BY adbpi.runDate DESC, adbpi.letter ASC, adbpi.pagenum ASC, s.words DESC
mhulse commented 9 years ago

Ha, looks like I already did all the thinking on this one:

https://gist.github.com/mhulse/4532459#file-2-new-style-ansi-conversion-sql

Note: I don't think I need to worry about parent cat ID as that's on the subcat table.

mhulse commented 9 years ago

This works:

SELECT DISTINCT
    TOP 50 cms.%Id, cat.categoryName, subcat.subCategoryName, status.statusName, pt.StartDate
FROM
    dt_cms_schema.CMSStory cms
INNER JOIN
    dbo.Story dbo
ON
    (dbo.ID = cms.story)
INNER JOIN
    dbo.Desk desk
ON
    (dbo.deskId = desk.deskId)
INNER JOIN
    dbo.SubCategory subcat
ON
    (subcat.subCategoryId = dbo.subCategoryId)
INNER JOIN
    dbo.Category cat
ON
    (cat.categoryId = subcat.categoryId)
INNER JOIN
    dbo.Status status
ON
    (dbo.StatusId = status.StatusId)
INNER JOIN
    dt_cms_schema.CMSStoryPubTracking pt
ON
    (cms.ID = pt.CMSStory)
WHERE
    status.statusName IN ('ready', 'output', 'web')
AND
    desk.deskName NOT IN ('trash')
AND
    cat.categoryName IN ('local news')
AND
    subcat.subCategoryName IN ('business')
AND
    pt.StartDate BETWEEN '2014-10-20' AND '2014-10-29'
ORDER BY
    dbo.storyId, pt.StartDate DESC;

Features:

A future project would be to build a RULE and class like https://github.com/registerguard/pick

mhulse commented 9 years ago

Watered down query that ignores status and startdates:

SELECT DISTINCT
    TOP 5 cstory.%Id AS CMSStory, cat.categoryName, subcat.subCategoryName
FROM
    dt_cms_schema.CMSStory cstory
INNER JOIN
    dbo.Story dstory
ON
    (dstory.ID = cstory.story)
INNER JOIN
    dbo.Desk desk
ON
    (desk.deskId = dstory.deskId)
INNER JOIN
    dt_cms_schema.CMSStoryPubTracking pt
ON
    (pt.CMSStory = cstory.ID)
INNER JOIN
    dbo.SubCategory subcat
ON
    (subcat.subCategoryId = dstory.subCategoryId)
INNER JOIN
    dbo.Category cat, dbo.SubCategory subcat2
ON
    (cat.categoryId = subcat2.categoryId)
WHERE
    desk.deskName NOT IN ('trash')
AND
    cat.categoryName IN ('local news')
AND
    subcat.subCategoryName IN ('elections region results')
ORDER BY
    pt.StartDate DESC;
mhulse commented 9 years ago

This is done. Closing.

Related though: registerguard/rg#93

The above issue should be easier to do now that I have a more solid code base for getting cats/subcats.