palewire / django-calaccess-campaign-browser

A Django app to refine, review and republish campaign finance data drawn from the California Secretary of State’s CAL-ACCESS database
http://django-calaccess-campaign-browser.californiacivicdata.org
MIT License
17 stars 12 forks source link

Develop a tool to flag committees or treasurers with an unusual number of amendments #215

Open palewire opened 9 years ago

rkiddy commented 9 years ago

What I did in my version of this is add a amend_id column to the filer_filings table. From the SoS, that table has a row for each filing but does not contain the amend_id value. So I added the maximum value of the amend_id for that filing found in the other tables.

I could put together a pull-request to do this fairly easily.

It may be worth creating a filing_amendments table. I am not sure. Do we want to check on whether amendment numbers were skipped, or what kind of data is added between amendments, or perhaps more significantly, what data is removed for each amendment? If so, then it would probably be good to have a filing_amendments table just to keep a row for each found amendment.

rkiddy commented 9 years ago

Just put in the pull-request for this. FYI, after running this, here are the counts of the different high amend_id values.

At first, I was getting duplicates in the filing_id because I forgot that the amend_id roughtly equals the filing_sequence, which is in the filer_filings table. I have not checked in the code for this, but here are the counts.

mysql> select ff_amend_id, count(*) from calaccess_campaign_browser_filingamendment group by ff_amend_id order by ff_amend_id;
+-------------+----------+
| ff_amend_id | count(*) |
+-------------+----------+
|           0 |  1567961 |
|           1 |    78640 |
|           2 |    11760 |
|           3 |     2640 |
|           4 |      858 |
|           5 |      403 |
|           6 |      195 |
|           7 |      138 |
|           8 |      103 |
|           9 |       69 |
|          10 |       68 |
|          11 |       37 |
|          12 |       35 |
|          13 |       26 |
|          14 |       10 |
|          15 |       19 |
|          16 |        7 |
|          17 |       11 |
|          18 |        6 |
|          19 |        5 |
|          20 |       11 |
|          21 |        3 |
|          22 |        2 |
|          23 |        8 |
|          24 |        6 |
|          25 |        3 |
|          26 |        4 |
|          27 |        7 |
|          28 |        1 |
|          29 |        3 |
|          30 |        2 |
|          32 |        4 |
|          33 |        4 |
|          34 |        1 |
|          35 |        1 |
|          36 |        3 |
|          37 |        3 |
|          38 |        5 |
|          42 |        2 |
|          43 |        4 |
|          44 |        2 |
|          45 |        3 |
|          46 |        1 |
|          47 |        2 |
|          49 |        3 |
|          50 |        1 |
|          51 |        1 |
|          52 |        1 |
|          53 |        2 |
|          55 |        1 |
|          56 |        1 |
|          60 |        1 |
|          63 |        1 |
|          64 |        2 |
|          65 |        1 |
|          66 |        1 |
|          75 |        1 |
|          78 |        1 |
|          83 |        2 |
|          85 |        1 |
|         100 |        1 |
+-------------+----------+

mysql> select count(*) from calaccess_campaign_browser_filingamendment where ff_amend_id != other_amend_id;
+----------+
| count(*) |
+----------+
|     1828 |
+----------+