kcigeospatial / balt_co_ETL

This will provide a place to track internal issues around the Baltimore County NPDES NCT application ETL.
0 stars 0 forks source link

Stormwater - BMP Inspections - how are records extracted from the DEPS view? #44

Closed leeensminger closed 6 years ago

leeensminger commented 6 years ago

The ETL logic is expected to find, for each outputted BMP facility, an inspection record in the DEPS view.

This is currently failing, but I don't know the details of the source data enough to test.

What is the source table? The output BMPID is listed as "orphan". What is the key identifier in the source table?

gerrykelly commented 6 years ago

@leeensminger The way this works is that "DEPS view" means only the v_pond_inspections view as the source for SW_INSP_REPORT.
The data to populate SW_INSP_REPORT is generated from: First the view (that I created) ETL_POND_INSP_FOR_SWI_REPORT that gets the sw_track_id (from sw_tracking) based on v_pond_inspections.pond_number = sw_tracking.stru_no. So that part seems good. Then, the view ETL_POND_INSP_FOR_SWI_RPT_MAX queries out just the most recent inspection for a sw_track_id. In our sample data now that gives us the 1476 records that go into SW_INSP_REPORT. So that part should be as good as it gets.

From there the logic should work like for other inspection sources so it gets the BMPID from sw_tracking and relates it to a BMP in BMPs. It looks like I might be doing something wrong in code though so don't waste time looking into until I figure it out (tomorrow). However, once I think it is working like it should, this is the info you need to have a test case and verify or ask me to fix it right.

gerrykelly commented 6 years ago

@leeensminger I think we a good with this now, seeing what you did for the stormwater test, just assigning back to you to get it off my list unless you see an issue here

leeensminger commented 6 years ago

@gerrykelly The associations are looking OK and I'm seeing what I should for most selection sets, but it looks like almost ALL inspections being loaded to the target with a BMP_ID of "orphan". These should not be migrated to the target.

In this image, I'm getting the 21 I expect to see, and 1300+ more. image

gerrykelly commented 6 years ago

@leeensminger So does it work if I simply delete all the ones with a value of orphan? (i.e. the records that get BMP_IDs are correct and the orphans are junk?)?

leeensminger commented 6 years ago

@gerrykelly Yes that works in this case

gerrykelly commented 6 years ago

@leeensminger Should be OK now

leeensminger commented 6 years ago

Appears resolved, except for a handful of ghosted bmps described in #68. Closing.

leeensminger commented 6 years ago

It appears we lost some inspections somehow. Looking at the source (on the left) against the output (on the right) - the two selected output records on the right have associated BMP inspections, the rest do not. You can see in the table on the left there are actually inspections present for output BMPs, which were not migrated to the output. I believe what is happening is that the inspection selection logic gets the most recent inspection date for each sw_tracking record, and if that inspection date is > reporting year, no inspections for that sw_tracking are migrated.

Your description, from above, for how the query works is as follows. At what point do you filter for reporting year? _First the view (that I created) ETL_POND_INSP_FOR_SWI_REPORT that gets the sw_track_id (from sw_tracking) based on v_pond_inspections.pond_number = sw_tracking.stru_no. So that part seems good. Then, the view ETL_POND_INSP_FOR_SWI_RPT_MAX queries out just the most recent inspection for a sw_track_id. In our sample data now that gives us the 1476 records that go into SW_INSPREPORT. So that part should be as good as it gets.

image

leeensminger commented 6 years ago

What should happen, is - _Then, the view ETL_POND_INSP_FOR_SWI_RPT_MAX queries out just the most recent inspection for asw_trackid, for when the inspection date year value is less than or equal to the selected reporting year

So if an sw_track_id has inspections in 2014, 2015, 2016, and 2017, and I run the ETL for RY 2016 - currently I get zero inspections for the given sw_track_id, but the intention is to get one inspection record, with the 2016 inspection.

gerrykelly commented 6 years ago

@leeensminger I'm looking at this now. I can apply your logic if that is not what I am already doing now but one other issue may be that my reporting year in ETL_POND_INSP_FOR_SWI_RPT_MAX is not the calendar year of the inspection. If is the fiscal year of the inspection_date (by adding 6 months). So an inspection date of 6/30/2017 is RY 2017, but 7/1/2017 is RY2018. If that is not correct let me know and I'll and I'll make it calendar year.

leeensminger commented 6 years ago

@gerrykelly You're correct about the fiscal year. That may have helped get a few records. But even still, if an sw_tracking record has an inspection in FY14 and FY15, if I run ETL for reporting year 2018, it should grab the most recent inspection for that reporting year or earlier... in this example, it'd be FY15 inspection that gets migrated to target then.

gerrykelly commented 6 years ago

@leeensminger I am no long using ETL_POND_INSP_FOR_SWI_RPT_MAX. I use a dynamic query that I hope works as you explained. If you still find anything missing, please let me know what reporting year and what records - tracing back to v_pond_inspections -- are missing.

leeensminger commented 6 years ago

@gerrykelly still looks like we're missing some.

Another issue I encountered is that V_POND_INSPECTIONS gives me total record counts that change each time I do a sort, on any field. So I exported to an independent table and worked with that, with a total of 9393 records in that exported table. So I hope thats correct.

Then, I selected all inspections where INSPECTION_DATE is not null and is in FY2013 or earlier, getting 5869 records. I found related SW_TRACKING records and found 21 related records in selection set 1A (total records in selection set 1A in sw_tracking = 50). So 21 inspections are expected in the output ETL.

In the ETL output for selection set 1A, I get 2 BMPInspections. These 2 are expected to be in the output, but we're missing the other 19 records. Below are the inspections that are missing, that I'd expect to see in the ETL output:

pond_num inspection_date
10 11/28/2012
21 5/12/2009
39 4/24/2009
73 11/8/2005
105 2/7/2012
109 5/9/2014
150 5/29/2013
227 5/3/2010
1278 5/26/2006
1325 11/19/2013
1372 7/25/2006
1440 4/8/2009
1717 8/13/2008
1809 2/19/2008
2091 1/10/2011
2855 12/28/2010
3270 8/25/2011
4910 10/3/2013
4934 3/20/2013
gerrykelly commented 6 years ago

@leeensminger 9393 is the correct count. It's weird that when I run for FY 2023, I get inspections related to 21 set 1A bmps, and most of them are 2015 or earlier.
I could not find what was wrong with my existing query, so I rewrote it in a whole new way and believe it now works correctly, but for RY2013 I get 20 inspections -- if you can identify the missing one and show why it should be included I can try to figure that out later. When you run this in the GP you are going to have an append failure bc there is one record in the bmp_inspection candidates that has a null bmp_status (from the pond INSP_STATUS). In this case it would have been deleted from the final bmp_inspections table anyway, but it needs to be dealt with. If someone can actually trace the record in the view and edit it to gave a value, that's the "proper" fix, but maybe it's OK to just handle it the easy way and have my query exclude null INSP_STATUS

leeensminger commented 6 years ago

Yes having your query exclude null insp_status is great.

From: Gerry Kelly Sent: Saturday, March 3, 5:15 PM Subject: Re: [kcigeospatial/balt_co_ETL] Stormwater - BMP Inspections - how are records extracted from the DEPS view? (#44) To: kcigeospatial/balt_co_ETL Cc: Lee Ensminger, Assign

Assigned #44https://github.com/kcigeospatial/balt_co_ETL/issues/44 to @https://github.com/leeensmingerleeensmingerhttps://github.com/leeensminger.

You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/kcigeospatial/balt_co_ETL/issues/44#event-1502696984, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD6ep8LUf0iE_54nffpa36nEB2sQfHOIks5taxYogaJpZM4Q8mCP.

leeensminger commented 6 years ago

I take that back... However you are currently handling it is great, no need to add the blanket "drop all nulls" part. If null insp status is allowed through to etl output and generates append failure table, that is fine and expected, and they can then edit the source as needed.

On Sat, Mar 3, 2018 at 5:14 PM -0500, "Gerry Kelly" notifications@github.com<mailto:notifications@github.com> wrote:

@leeensmingerhttps://github.com/leeensminger 9393 is the correct count. It's weird that when I run for FY 2023, I get inspections related to 21 set 1A bmps, and most of them are 2015 or earlier. I could not find what was wrong with my existing query, so I rewrote it in a whole new way and believe it now works correctly, but for RY2013 I get 20 inspections -- if you can identify the missing one and show why it should be included I can try to figure that out later. When you run this in the GP you are going to have an append failure bc there is one record in the bmp_inspection candidates that has a null bmp_status (from the pond INSP_STATUS). In this case it would have been deleted from the final bmp_inspections table anyway, but it needs to be dealt with. If someone can actually trace the record in the view and edit it to gave a value, that's the "proper" fix, but maybe it's OK to just handle it the easy way and have my query exclude null INSP_STATUS

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/kcigeospatial/balt_co_ETL/issues/44#issuecomment-370184218, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD6ep5c9pnLOffR2JztM1O8AzVlDZ0ycks5taxYPgaJpZM4Q8mCP.

leeensminger commented 6 years ago

Overall its looking much better. The missing record is for stru_no 2855. I can't tell why it's missing - it has a good date and inspection result. The expected output inspection would be the 2010 inspection, since this output is for RY 2013. Here are the records in v_pond_inspections for this record:

image

gerrykelly commented 6 years ago

@leeensminger That record comes in through my query and into the ETL potential bmp inspection records, but it relates to SW_TRACK_ID = '{9D7A38EC-6095-47BE-B8BC-01577F25603D}' which has an RY of 2099 (as does the related BMP) image

leeensminger commented 6 years ago

Good catch. Its bad data - they have 5 records for 2855, all of which have sw_tracking_status = Current. Only one of which has RY 2013. Marking this as closed.

image