Check CountDracula filters vs full set of 3-hour counts

GoogleCodeExporter commented 9 years ago


As Renee's email notes, there appears to be more 3-hour counts that were 
available to the static model validation than we have coming out of 
CountDracula.  For CountDracula, I am looking at:

   counts_generated_2012-07-18.zip

When I look at the spreadsheet in there, I find that linkavg_recent_60 has 
counts at 21 unique locations, and linkavg_recent_midweek_15 has counts at 96 
unique locations.  So we end up with 117 link count locations for the subset 
were using (plus some additional locations where we can aggregate the movement 
counts into link counts if all outgoing movements are covered.  

However, when I look at the comparison to the static model, which is available 
here: 

    http://dta.googlecode.com/files/pb_july11_300p_FT_Reports.zip

We appear to have 902 counts on links in the DTA network (plus a bunch on links 
not in the network, presumably throughout the Bay Area).  The question is: why 
do we have so many more counts available for the 3-hour validation?  

Possible causes:

1. The inclusion criteria for the static validation are too loose, and we 
should really be more restrictive, like we are in the DTA validation.

2. Maybe these counts aren't actually in CountDracula. 

3. Maybe we have some sort of a matching problem where they are getting lost 
along the way.

We should diagnose this first by seeing if anyone happens to know the answer in 
case it is an easy answer.  If not, we should find some locations that show up 
in the 3-hour counts, but not in CountDracula, and trace back through to 
determine the source of those counts, and whether they in fact should or should 
not be in CountDracula.  

I've assigned to Lisa for now to put it on the CountDracula radar screen, but 
I'm happy to dig into this on our end, so feel free to pass it back if you feel 
that's appropriate.  We would want to get pointed towards the source of the 
static counts so we can trace those back.

Original issue reported on code.google.com by greg.erh...@ucl.ac.uk on 30 Jul 2012 at 9:53

Blocking: #151

GoogleCodeExporter commented 9 years ago

Regarding the 3-hour counts you're referring to for the static model 
validation, which workbook/worksheet in that zip file are you talking about?  
Results in 11RoadwayValidation_pb_july11_FT_v1.xlsx ?  Where is the 902 number 
coming from?  And where is this worksheet from?

Just trying to track down the source of the static counts; AFAIK we didn't pull 
static counts out of count dracula (count dracula wasn't functional that way 
until more recently than any CHAMP calibration).  So they may or may not be in 
count dracula.  My guess is that it's a combination of causes 1 2 and 3.

Original comment by lisa.zorn@sfcta.org on 2 Aug 2012 at 6:34

GoogleCodeExporter commented 9 years ago

The counts he's referring to are from the 11RoadwayValidation spreadsheet, 
which I have adapted from 11RoadwayValidation_With101detail_RPM9_3.2.x_cr14.xls 
as the AMB validation spreadsheet.  

902 is the number of counts in that spreadsheet whose ANode and BNode match a 
link in our DTA results, so it doesn't include split links.  

I think the main question is whether there is a matching problem we could fix 
or an exclusion criterion that could reasonably be loosened so that we will 
have more counts to use.

Original comment by alsu...@pbworld.com on 2 Aug 2012 at 6:54

GoogleCodeExporter commented 9 years ago

Original comment by elizab...@sfcta.org on 27 Aug 2012 at 4:51

GoogleCodeExporter commented 9 years ago

I sent the counts for 2000, 2005, and 2010 to Dan/Lisa (they are now in the 
counts folder under MTC).  They are by A and B node, so we need to decide if we 
want to re-map them or expand CD capability to accomodate AB nodes (hence, I'm 
rolling the ball back into Lisa's court).

Original comment by elizab...@sfcta.org on 10 Sep 2012 at 3:37

GoogleCodeExporter commented 9 years ago

Original comment by elizab...@sfcta.org on 26 Sep 2012 at 6:22

Now blocking: #151

GoogleCodeExporter commented 9 years ago

Ok, I have added the MTC counts to CountDracula (which was nontrivial because 
these counts, unlike our previous counts, have no date associated with them, 
only a year, AFAIK). CountDracula change is here: 
https://github.com/sfcta/CountDracula/commit/480191dd6895b2d3ae301cd22c2e5203786
c5d55

On the DTA side, I have instrumented the attach counts to grab these new counts 
as well; these are 180 minute counts starting at 3:30p.  This update is in 
Revision 853598fe75ad.

One thing to note: we pull all counts and "recent" counts for this period 
(consistent with the other types of counts), but there are very counts from 
2010 compared to 2000 and 2005 (see attached PDFs for visuals).  I'm not sure 
why, since I don't have more information for the source of the count files in 
Q:\Roadway Observed Data\MTC

Since CountDracula is staged on Tehama, team members can grab counts by VNCing 
to tehama, opening a command prompt window (or using the red one already open), 
navigating to the directory with your network files, and running the 
:importCounts block from 
http://code.google.com/p/dta/source/browse/scripts/importFullSanFranciscoNetwork
Dataset.bat?name=dev

Original comment by lisa.zorn@sfcta.org on 15 Oct 2012 at 11:20

Changed state: Fixed

Attachments:

GoogleCodeExporter commented 9 years ago

Doh, dropped a word:

"but there are very counts from 2010 compared to 2000 and 2005" -->

but there are very FEW counts from 2010 compared to 2000 and 2005

Original comment by lisa.zorn@sfcta.org on 15 Oct 2012 at 11:21

sfcta / dta

Check CountDracula filters vs full set of 3-hour counts #132