brianmwaters commented 4 years ago

In the testing instance for Vermont, we have one department that never gave us demographics data. So age, race, and gender are null for all the cops in this department. If a user tries to make a search that is keyed on one of these fields, they will get zero results. (For example, if they try to search for all female cops in one of these departments). This is a problem because it could cause users to not find officers who are indeed in OpenOversight.

To fix this, the search functionality should be changed so that, for example, when a user searches for all female cops, the application will return all female cops, plus all cops with a null gender.

redshiftzero commented 4 years ago

(I think this was the behavior previously, so we should write a regression test when this is fixed)

juholgado commented 4 years ago

Hi all, looking at the form for adding officers https://github.com/lucyparsons/OpenOversight/blob/bd2e4d522438813276de4bb26065837b8fe33045/OpenOversight/app/main/forms.py#L182 it looks like the default value for Strings such as first_name, middle_initial, suffix, etc. is the empty string. Is that how they are saved when the officer object is inserted into the database if left blank?

When I tried performing the bulk-add-officers function via flask with test data using providing the minimum attributes, I saw that the fields that were not provided were saved as None. From this, it seems that attributes for which we don't have data can be inserted into the database as None or the empty string. Is this intended? If so, should both of these cases be accounted for when implementing a fix for this bug? Thanks!

EMCain commented 4 years ago

(I'm collaborating on this issue with jholgado)

I am looking very specifically at the gender filter code (as a starting point) and I'm confused by the functionality that already exists.

https://github.com/lucyparsons/OpenOversight/blob/develop/OpenOversight/app/utils.py#L277

        if 'Not Sure' in form['gender']:
            form['gender'].append(None)
        officer_query = officer_query.filter(Officer.gender.in_(form['gender']))

Shouldn't marking the gender as "not sure" mean that all genders are shown? If someone checks both "female" and "not sure" on the gender checkboxes maybe it makes sense to show all officers regardless of gender but put the female officers first, or something. But I'm having trouble reading this as anything other than "if user says they're not sure about the gender of the officer, include officers without a gender listed in the results" --when in reality, whether or not the user was able to identify the officer's gender is a separate issue from whether or not our records contain that information. (Extrapolate this to race, age, rank, etc.)

It's possible this is worth filing a new issue over but I want to make sure I actually understand the current behavior correctly first.

redshiftzero commented 4 years ago

Nice finds y'all, thanks for working on this!

attributes for which we don't have data can be inserted into the database as None or the empty string

If the form is inserting empty strings in the database instead of None/null when we don't have a field, that's a bug. I think doing the following would make sense:

Ensure forms insert None when the field is absent.
Migrate any existing values in the officers table that contain '' for these columns (first_name, and others that are nullable) to be null.

Shouldn't marking the gender as "not sure" mean that all genders are shown? If someone checks both "female" and "not sure" on the gender checkboxes maybe it makes sense to show all officers regardless of gender but put the female officers first, or something.

Yep... this also is a problem. I agree the behavior that we want is:

If user enters "Not sure", we don't filter on that attribute.
If user enters X for an attribute Y, we should return all officers with attribute Y=X as well as attribute Y=None.

So we should definitely modify how the "Not sure" behavior is handled in this filter function, and we'll want to update the tests in test_utils.py to cover this bug.

EMCain commented 4 years ago

ok, great summary of the desired behavior, thanks.

I might remove the checkbox option for "not sure" on the filter page -- either people know the officer's gender or they don't, and "they don't" is really what "not sure" seems to indicate, so it doesn't make sense to have it as a multi-select filter option.

EMCain commented 4 years ago

also @juholgado I can help you with the data migrations for getting rid of the empty-string values from the database, talk to me when you're ready to work on that.

EMCain commented 4 years ago

@brianmwaters @redshiftzero do you know where officer listings/demographic data about cops in a department are being uploaded?

My impression is that there are two places:

1) an individual input form where volunteers can add one officer at a time 2) a bulk import feature where they upload a spreadsheet of officers

Are there any others? Also, am I correct to assume that any missing data in 2) should be handled with a reasonable guess or default, rather than throwing it back to the user to correct before uploading?

EMCain commented 4 years ago

Since this issue has several related components, we have decided to work on a partial fix PR that addresses it for gender specifically. Then, future contributors can use our work as an example for how to fix it for other demographic categories.

I’m going to go into detail about what I mean by this, but can we get a sign-off from @redshiftzero or another admin to approach the issue in this way before we spend substantial time creating a PR for it, and if so, can @juholgado and I be added as assignees to this issue? The work will be done on the fork https://github.com/juholgado/OpenOversight. Please let us know if you have any questions prior to signing off.

Overall problems being solved:

While the filtering functionality in the cop lookup features seems to depend on the police officers’ gender being set to “M”, “F”, or “Other” in the database to be useful, there seems to be no enforcement of this at any level, and in practice this value can be set to any string, meaning (for example) an officer whose gender is set to “Man” will not be returned if someone wants to filter for officers with the gender “M” or “Male”
The concept of “Not Sure” as a possible gender value conflates two unrelated problems:
- The admin or volunteer who input the officers’ demographic data did not have data on that officer’s gender
- The user who is trying to file a police complaint was unsure what the gender of the officer they are reporting is.
Officers with missing data in the gender column are not being returned in a search that filters on gender. This is the original problem described by this issue, but in my opinion it makes most sense to address all of them at once to avoid technical debt and unexpected side effects.

The specific steps needed to fix this are:

Database changes (in models and migrations)
- Add database constraint that only allows the following four values as gender for an officer: M, F, Other, NULL (i.e. ensure field is nullable and there is a check constraint)
- Set default value of column to NULL if it’s not already
- If needed, add an UPDATE query to the up migration that sets unacceptable values to Null. (This might be covered by the previous two steps).
Change the various ways officer data can be input or updated to only set to one of those values
- Drop down forms for an individual officer being added or updated should interpret “not sure” or “information missing” as NULL
- Bulk import functionality (from a spreadsheet) should generously interpret terms like “man”, “male”, “m”, as “M” (and so on for female and other/nonbinary genders), and interpret everything else as NULL
Change code in app.utils.filter_by_form
- It should not add a gender filter if “not sure” is set
- It should return officers with NULL gender regardless of what kind of gender filter is set, so e.g. if the user is looking for a male officer it should return officers with “M” or NULL as their gender
remove the "not sure" option from the "browse officers" page (users should leave this field blank if they don't know the officer's gender)
Update test data included with dev app
- Most of the cops with their gender set to “Not Sure” should probably be set to NULL.
- A few of them could be set to “Other.”
Update test suite to test both the import and filter behaviors
[ADDED 27 JULY 2020] make sure any user-facing interface displays a NULL gender as "UNKNOWN" (or some similar wording) rather than "None" -- we want to convey "we don't know what this officer's gender is," not "this officer has no gender."

abandoned-prototype commented 4 years ago

I think this plan sounds great. Thank you both for the work you already did in this to concretely identify the work that needs to be done. Definitely agree with replacing "Not Sure" with NULL, this will make things so much cleaner. Love the idea of putting in database constraints to enforce the four options as well.

If at all possible I would like us to do a similar thing as the one proposed for the bulk import to try to convert all existing values to one of the 4 allowed ones (instead of just setting them all to NULL).

For the previous question:

@brianmwaters @redshiftzero do you know where officer listings/demographic data about cops in a department are being uploaded?

My impression is that there are two places:
1. an individual input form where volunteers can add one officer at a time

2. a bulk import feature where they upload a spreadsheet of officers

Yes, that is correct, those are the two options to create new officers in the database.

Also, am I correct to assume that any missing data in 2) should be handled with a reasonable guess or default, rather than throwing it back to the user to correct before uploading?

That is a good question and I have been thinking about this for a little bit as well. I think the way you outlined it in your plan sounds fine, if it can't be interpreted at all we probably want to at least print a warning.

I will assign the two of you to the ticket and make sure that either @redshiftzero or another admin gives their approval.

EMCain commented 4 years ago

If at all possible I would like us to do a similar thing as the one proposed for the bulk import to try to convert all existing values to one of the 4 allowed ones (instead of just setting them all to NULL).

Ah, good catch, I was just thinking in terms of test data but of course it’ll have to work on existing datasets too. I’m planning on having a function normalize the gender values and it can be used in the migration as well.

brianmwaters commented 4 years ago

Just catching up on this thread. Looks like y'all are doing some good work.

This might belong in a separate issue, but I thought I'd drop it here so y'all who are working on this can be aware as it may be relevant:

In OO, if you go to "find an officer" (http://localhost:3000/find) and leave all the fields blank and/or "not sure", the search will return all of the officers in the specified department. However, on the officer listing page (the search results, or https://github.com/lucyparsons/OpenOversight/blob/develop/OpenOversight/app/templates/list_officer.html, or for example http://localhost:3000/department/1?min_age=16&max_age=85&name=&badge=&unique_internal_identifier=), there is a search menu on the left-hand side called "filter officers": 2020-07-07-211145_308x821_scrot

If the user selects "not sure" for any of the fields and clicks "submit", then the form will return only those officers for whom the actual field was "not sure."

I think this ties in w/ some of the comments above about "not sure" being redundant, and wanted to just point it out for those who are working on this issue, in case you are not aware already.

EMCain commented 4 years ago

@brianmwaters yep, we are working on addressing the problem you identified.

A question for you: @juholgado is working on adding tests, in particular, we want to create another test very similar to this one in test_utils.py:

    department = OpenOversight.app.models.Department.query.first()
    results = OpenOversight.app.utils.grab_officers(
        {'gender': ['M'], 'dept': department}
    )
    for element in results.all():
        assert element.gender in ('M', 'Not Sure')

Where does mockdata come from? I don't see it explicitly assigned anywhere. It's not the same as the sample data provided with the app, is it?

brianmwaters commented 4 years ago

I'm confused because I don't see mockdata in the code snippet you included!

EMCain commented 4 years ago

oh sorry! It is the argument to the function, I left that line out by mistake.

def test_gender_filter_select_all_male_officers(mockdata):
    department = OpenOversight.app.models.Department.query.first()
    results = OpenOversight.app.utils.grab_officers(
        {'gender': ['M'], 'dept': department}
    )
    for element in results.all():
        assert element.gender in ('M', 'Not Sure')

juholgado commented 4 years ago

We worked on some changes for filter_by_form today, focusing on how it handles the gender field. These changes alter the behavior such that

If "Not Sure" is specified in the filter form, no filters are added for that field. For example, if Not Sure and Male are checked on the filter form, we do not filter on gender.
Specifying a gender also returns officers whose gender is "Not Sure". For example, checking Male returns all Male officers and Not Sure officers

brianmwaters commented 4 years ago

@juholgado that sounds great. What do you think about also removing the "Not Sure" check box from the UI?

EMCain commented 4 years ago

@redshiftzero and @brianmwaters I am working on normalizing the various string values representing gender to 'M', 'F', 'Other', or NULL.

Do we know what values are currently being used for genders in existing instances' databases, as per @abandoned-prototype 's request to normalize the existing data?

If at all possible I would like us to do a similar thing as the one proposed for the bulk import to try to convert all existing values to one of the 4 allowed ones (instead of just setting them all to NULL).

So for example I'm expecting "M", "Male", and "Man" (and their uppercase/lowercase variants) to all be entries that exist to represent a male officer, so I'd be doing an update query like this (forgive me if the syntax isn't perfect) prior to adding a check constraint:

UPDATE TABLE officer
SET gender = 'M' 
WHERE gender::citext in ( "M", "Male", "Man" );

(similar to the solution explained here for the WHERE clause, to make it case-insensitive https://til.hashrocket.com/posts/o0ohxzb7ho-case-insensitive-in-query-in-postgres-psql)

and similar for "F" ("Female", "Woman") and "Other" ("Other", "Nonbinary")

Am I missing any gender labels that would currently exist and need to be normalized?

Note that "Not Sure" will be converted to NULL as will any unrecognized gender values.

ETA: for my own reference, the database changes are happening on this branch https://github.com/EMCain/OpenOversight/tree/711-db-changes

brianmwaters commented 4 years ago

I've never actually looked at the prod data, so I don't know.

If Jen can get me SQL access (will probably end up being helpful for VT data import as well), then I can run a query for you that would show all of the entries under the gender field. But, based on my reading of the OO source code, it should be normalized to a few canonical values. But there could be outliers.

brianmwaters commented 4 years ago

Also, @EMCain, re: mockdata, I'm honestly not super familiar w/ how tests work in OO, but here is where you want to look: https://github.com/lucyparsons/OpenOversight/blob/develop/OpenOversight/tests/conftest.py#L431

abandoned-prototype commented 4 years ago

@EMCain @brianmwaters In regards to the mockdata on the pytest fixture stuff I recommend taking a look at this documentation https://docs.pytest.org/en/stable/fixture.html So basically whenever you use the name of a function that was tagged as pytest.fixture as a parameter of your test it automatically runs that function and assigns the return value as that parameter. These fixtures might depend on others and there are also a few of built in fixtures (personally I am not a fan of this kind of magic, definitely struggled with figuring out what was going on here myself)

I also don't have SQL access, but I can take a quick look at the data via the csv downloads. Not as efficient but should give us the results we need

abandoned-prototype commented 4 years ago

Done with this. I am pretty sure I got the actual values from the database and it turns out that they are all either "M", "F", "Not Sure" or NULL. So it seems no additionally work needs to be done here, sorry for adding unnecessary complexity and thanks for asking questions on this!

For the record, if someone looks at this later working on the "race" field, that one is definitely a bigger mess with 12 different options present.

brianmwaters commented 4 years ago

Shouldn't we get rid of the "Not Sure" entries and replace them with NULL? I know a bunch of y'all on this issue have been thinking about this in detail so I leave it to you.

EMCain commented 4 years ago

On Sun, Jul 12, 2020 at 22:31 int10h notifications@github.com wrote:

Shouldn't we get rid of the "Not Sure" entries and replace them with NULL? I know a bunch of y'all on this issue have been thinking about this in detail so I leave it to you.

Yes, we are replacing anything other than “M”, “F” and “Other” with NULL.

EMCain commented 4 years ago

progress notes from today:

We worked on this branch on @juholgado 's fork https://github.com/juholgado/OpenOversight/tree/711-db-changes

(eventually it will be merged into the branch 711-filter-by-form-gender, which we will then use to open a PR to the main project)

We changed the model to constrain the gender options to ('M', 'F', 'Other', NULL)

We created a migration script that adds this constraint (as well as shortening the max length of the field to 5, as 'Other' is the longest value it can hold) as well as normalizing any data with UPDATE queries. We used this mapping (case-insensitive):

    genders = {
        "M": ('male', 'm', 'man'),
        "F": ('female', 'f', 'woman'),
        "Other": ('nonbinary', 'other'),
    }

Please let us know if there are any options we are missing here.

If anyone is more experienced with SQLAlchemy and Alembic we will probably ask for help converting some of the raw SQL we created into those libraries.

Some SQL commands to generate junk data before running the migration, to test it:

update officers set gender = 'man' where first_name = 'IVANA' and gender = 'M'; 
update officers set gender = 'MAN' where first_name = 'SEYMOUR' and gender = 'M'; 
update officers set gender = 'woman' where first_name = 'IVANA' and gender = 'F'; 
update officers set gender = 'WoMaN' where first_name = 'SEYMOUR' and gender = 'F'; 
update officers set gender = '' where first_name = 'HAYWOOD';

And a command to summarize the values in the table, to see if it worked:

select gender, count(*), gender IS NULL as isnull from officers group by gender;

EMCain commented 4 years ago

Now that gender records of "not sure" have been replaced with None, this is what displays on the officer listing:

I'm thinking this could confuse people (note that the case I'm describing is "we (the project maintainers) don't know the officer's gender," not "the officer identifies as nonbinary (etc.)") What do you think of replacing the word "None" in this context with "UNKNOWN" or something similar? @brianmwaters do you have a preference for what wording to use to convey to users that the officer's gender is not known in your records?

Note that this is purely a frontend interface question -- we are going to continue representing this concept on the backend as NULL/None.

EMCain commented 4 years ago

(Deleted a comment about how to run tests because I figured it out)

juholgado commented 4 years ago

Today, with the proposed DB changes in mind(changing 'Not Sure' entries to NULL):

Changed the checkbox in the filter form for gender to be a radio button. This is so that only one option (Male, Female, Other, Not Sure) can be selected at a time. Male, Female, and Other will return officers whose gender is set to that along with all officers whose gender is NULL. Not Sure will just not add any filtering based on gender.
In an earlier note, I changed filter_by_form to return officers whose gender is set to Not Sure in addition to the requested gender. Now, filter_by_form will return officers whose gender is set to NULL in addition to officers whose gender is set to the requested one. For example, requesting Male officers will return all Male officers along with officers whose gender is set to NULL

EMCain commented 4 years ago

I am having trouble figuring out why my code change has caused an IntegrityError in a seemingly unrelated test. (I am also having trouble with other tests but right now I'm focusing on this one file.) Can someone please help?

to summarize -- it's causing the test OpenOversight.tests.routes.test_officer_and_department.test_admin_can_upload_photos_of_dept_officers to fail. I'm not sure why since I did not touch anything related to photo uploading. I know @juholgado is also having test failures related to photo upload.

my question is, what is the connection between officer gender (probably in forms) and photo uploading endpoints that could be causing this? Maybe if I can figure out why this one test is failing, @juholgado and I can also fix the other ones.

Can someone please help us figuring out what is going on here?

much more detailed info:

Here's the point in my code where I'm currently experiencing this problem. https://github.com/lucyparsons/OpenOversight/compare/develop...juholgado:711-db-changes

commit: https://github.com/lucyparsons/OpenOversight/commit/627a7a6876edc2fd4ef7fb3e1b54c3ed240e4a31


self = <sqlalchemy.dialects.sqlite.pysqlite.SQLiteDialect_pysqlite object at 0x7f255f15d700>, cursor = <sqlite3.Cursor object at 0x7f255d7c2490>
statement = 'INSERT INTO faces (officer_id, img_id, original_image_id, face_position_x, face_position_y, face_width, face_height, user_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?)'
parameters = (6, 1, 1, None, None, None, ...), context = <sqlalchemy.dialects.sqlite.base.SQLiteExecutionContext object at 0x7f255d971100>

    def do_execute(self, cursor, statement, parameters, context=None):
>       cursor.execute(statement, parameters)
E       sqlite3.IntegrityError: UNIQUE constraint failed: faces.officer_id, faces.img_id

/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py:552: IntegrityError

The above exception was the direct cause of the following exception:

mockdata = '3842', client = <FlaskClient <Flask 'OpenOversight.app'>>, session = <sqlalchemy.orm.scoping.scoped_session object at 0x7f255e47bd00>
test_jpg_BytesIO = <_io.BytesIO object at 0x7f255d805040>

    def test_admin_can_upload_photos_of_dept_officers(mockdata, client, session, test_jpg_BytesIO):
        with current_app.test_request_context():
            login_admin(client)

            data = dict(file=(test_jpg_BytesIO, '204Cat.png'),)

            department = Department.query.filter_by(id=AC_DEPT).first()
            officer = department.officers[3]
            officer_face_count = officer.face.count()

            crop_mock = MagicMock(return_value=Image.query.first())
            upload_mock = MagicMock(return_value=Image.query.first())
            with patch('OpenOversight.app.main.views.upload_image_to_s3_and_store_in_db', upload_mock):
                with patch('OpenOversight.app.main.views.crop_image', crop_mock):
                    import pdb; pdb.set_trace()
>                   rv = client.post(
                        url_for('main.upload', department_id=department.id, officer_id=officer.id),
                        content_type='multipart/form-data',
                        data=data
                    )

also, I tried uploading a picture in my browser and got this weird behavior, a bunch of text being dumped in the form field

Is this due to not having AWS credentials set up? I don't see how that would be causing the test failures but I figure it's worth mentioning.

EMCain commented 4 years ago

tests that are failing for me:

{
  "tests/routes/test_image_tagging.py::test_user_can_add_tag": true,
  "tests/routes/test_officer_and_department.py::test_ac_can_upload_photos_of_dept_officers": true,
  "tests/routes/test_officer_and_department.py::test_admin_can_upload_photos_of_dept_officers": true,
  "tests/test_utils.py::test_csv_import_idempotence": true,
  "tests/test_utils.py::test_csv_import_new": true,
  "tests/test_utils.py::test_csv_import_update": true,
  "tests/test_utils.py::test_csv_new_assignment": true,
  "tests/test_utils.py::test_csv_new_name": true,
  "tests/test_utils.py::test_csv_new_officer": true,
  "tests/test_utils.py::test_csv_new_salary": true
}

tests that are failing for @juholgado on her branch:


tests/test_commands.py::test_csv_import_update 
tests/test_commands.py::test_csv_new_name
tests/test_commands.py::test_csv_new_officer
tests/routes/test_image_tagging.py::test_user_can_add_tag
tests/routes/test_officer_and_department.py::test_admin_can_upload_photos_of_dept_officers
tests/routes/test_officer_and_department.py::test_ac_can_upload_photos_of_dept_officers
tests/routes/test_officer_and_department.py::test_edit_officers_with_blank_uids```

abandoned-prototype commented 4 years ago

I'm thinking this could confuse people (note that the case I'm describing is "we (the project maintainers) don't know the officer's gender," not "the officer identifies as nonbinary (etc.)") What do you think of replacing the word "None" in this context with "UNKNOWN" or something similar? @brianmwaters do you have a preference for what wording to use to convey to users that the officer's gender is not known in your records?

Definitely agree that it should not display "None". I think displaying "Gender: Unknown" is fine, alternatively we can chance the template, to only show the entry if it's not None

tests that are failing for me:

In regards to the failing tests, I plan to take a look tomorrow. Definitely strange that it's different tests for the both of you

EMCain commented 4 years ago

To be clear, we are working on different branches.

I am working on https://github.com/juholgado/OpenOversight/tree/711-db-changes

Julia is working on https://github.com/juholgado/OpenOversight/tree/711-filter-by-form-gender

So it makes sense that different tests are failing. What’s odd is that it’s hard to figure out the relationship between what has been changed and what tests are failing.

On Sun, Jul 26, 2020 at 22:45 abandoned-prototype notifications@github.com wrote:

I'm thinking this could confuse people (note that the case I'm describing is "we (the project maintainers) don't know the officer's gender," not "the officer identifies as nonbinary (etc.)") What do you think of replacing the word "None" in this context with "UNKNOWN" or something similar? @brianmwaters https://github.com/brianmwaters do you have a preference for what wording to use to convey to users that the officer's gender is not known in your records?

Definitely agree that it should not display "None". I think displaying "Gender: Unknown" is fine, alternatively we can chance the template, to only show the entry if it's not None

tests that are failing for me:

In regards to the failing tests, I plan to take a look tomorrow. Definitely strange that it's different tests for the both of you

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lucyparsons/OpenOversight/issues/711#issuecomment-664132424, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXQSXPPZKBLBSZJISIKFZLR5UH7DANCNFSM4NTRT6QQ .

abandoned-prototype commented 4 years ago

I was not able to reproduce @juholgado's failing tests, only two tests failed for me one of them was test_gender_filter_select_all_male_officers which seems to be work in progress. The other is I think an issue with my test setup (and also not included in the provided list)

I was however reproduce your failing tests @EMCain. One issue I identified is in the bulk-add-officer command, to upload a csv file. The problem is that if the officer has gender = None it will be parsed as empty string in the csv and when loading the csv the command loads the empty string as the gender field and then tries to stuff that into the database which leads to an error thanks to the constraint you implemented. There are two steps needed to fix this problem, the diff is here:

--- a/OpenOversight/app/commands.py
+++ b/OpenOversight/app/commands.py
@@ -150,7 +150,7 @@ def set_field_from_row(row, obj, attribute, allow_blank=True, fieldname=None):
     if fieldname in row and (row[fieldname] or allow_blank):
         try:
             val = datetime.strptime(row[fieldname], '%Y-%m-%d').date()
-        except ValueError:
+        except (ValueError, TypeError):
             val = row[fieldname]
         setattr(obj, attribute, val)

@@ -380,7 +380,7 @@ def bulk_add_officers(filename):
             else:
                 raise Exception('Officer {} {} missing badge number and unique identifier'.format(row['first_name'],
                                                                                                   row['last_name']))
-
+            row["gender"] = row["gender"] or None
             if officer:
                 update_officer_from_row(row, officer)
             else:

After that change 3 failures are remaining. Those are even more annoying because it literally has nothing to do with your change. There are some assignments made in the test-data and they are done completely randomly. But there are tests that only work if certain objects have not been created, but it is completely by coincidence if those exist or not. I hope this makes some sense, but bottom line is that you can feel free to ignore any test that contains csv or face or tag for now and I will either fix those tests or create an issue with instructions for someone else to fix them

EMCain commented 4 years ago

We spent a lot of time today trying to debug the tests on @juholgado's brancy. We didn't really make much progress in terms of fixing them, though we did learn some stuff about further functional changes needed for our next work session. Our plan is to keep working on this and ignore the failing tests for now, and when we make a PR to develop we can make note of what is failing and get help then. I don't really want to worry about it before then because there are other functional changes needed and I don't want to get bogged down in debugging tests

At least one of the tests is failing because the sample officers are not being created in the test setup in conftest.py, and we cannot figure out why; none of the things we changed in that seem to be causing the problem.

    def test_csv_import_update(csvfile):
        n_existing = Officer.query.count()
        assert n_existing > 0
        n_created, n_updated = bulk_add_officers([csvfile], standalone_mode=False)
        assert n_created == 0
>       assert n_updated == 0
E       assert 13 == 0
E         -13
E         +0
tests/test_commands.py:190: AssertionError

Further exploration here revealed that there weren't any officers in the DB.

Honestly, this test suite is pretty difficult for us to work with for a number of reasons, including the fact that it all seems to use a vast amount of randomly-generated data rather than creating small, deterministic datasets for each test. That's outside the scope of this issue though, which is why we are waiting to get the tests to work until we submit the PR.

If there's still problems by that time, is it possible someone better versed in this project could get on a call with one or both of us to try and figure out what's going on? Hopefully it won't come to that and the problem will be fixed by then, and I'm guessing it'll be another 2 or 3 weeks minimum before we are PR-ready.

abandoned-prototype commented 4 years ago

Thanks for the update! Totally agree with your assessment of the situation. There are a lot of not-so-great things happening in the tests and eventually we definitely need to clean up our tests. But this is certainly out of scope of your work here. So, your approach of ignoring the failing tests, especially those that you seemingly didn't touch makes a lot of sense to me, so you can both focus on implementing the changes you proposed here.

I am happy to help fix up the tests once your work is ready and I am also available to get on a call about that, just let me know

juholgado commented 4 years ago

Today, we updated the bulk_add_officers() function to normalize gender when updating an officer via a CSV. We check for some string such as Male/Female, M/F, Man/Woman and insert the new gender value into the database as M, F, Other, or None if the new value is not recognized.

One question that arose while implementing this change: How do site admins add officers via a CSV file?

We created a CSV file using Excel, but ran into errors because of the BOM, U+FEFF, that got added to the beginning of the file when saving the workbook as a CSV. Is there a format that is expected to be received when adding officers via a CSV file? How are the CSVs used in this functionality usually generated?

abandoned-prototype commented 4 years ago

Sorry @juholgado I didn't see this update earlier. For csv, there are some that @brianmwaters has on his repo that were used to add data to our production database. I recommend this one: https://github.com/brianmwaters/vt-police-tools/blob/master/vt_police_tools/data/ccso/2020-01/cleaned/roster.csv (Import with the flag --update-by-name)

For running the command, there is some info here: https://github.com/lucyparsons/OpenOversight/blob/develop/docs/bulk_upload.rst, let me know if you need more instructions in how to get this running.

In generating the csv, python can generate csv with the built-in csv package. I use the pandas package in python which is a data science package similar to R. Unfortunately, I am not familiar with Excel and how to prevent it from adding this binary character at the beginning. Did you try choosing UTF-8 for the encoding? If that doesn't make a difference you could try to use the python console and do something like open("<path/to/your/file>", "rb").read().decode("utf-8-sig") and save the resulting string as text. Or open as text and cut off the first character via text[1:]

EMCain commented 4 years ago

we've been busy and took a break from working on this but will start back up this weekend.

EMCain commented 4 years ago

I created a draft PR for this. It's not as clean or complete as I'd like but I think realistically I've been putting up wrapping this up and we are mostly done, so could we get some help figuring out what needs to be done for it to be ready? @brianmwaters @abandoned-prototype @redshiftzero Thank you!

EMCain commented 3 years ago

I fixed some final problems with the PR and I’m waiting for it to be approved and merged.

Some thoughts on the issues I want to create after this, I’m mostly writing this for my own reference since I intend to open them but I’d appreciate other people’s thoughts too.

Make unified UX decisions + functions

How do we want to represent the concept of “data missing” (ie how to translate NULL to a string in the interface)—should it vary based on the category? Should it be different when inputting data vs. viewing the existing records? Would also include things like writing model class methods and Jinja filters to display this in a uniform way

better organize choices constants

Right now there’s a bunch of constants under choices.py and I’m not very happy with how I convert the values to be compatible with different types of forms in the gender code. I want to come up with a more unified approach so it’s clear where all the form choices are coming from.

Normalize race data in DB and forms

This one is probably the closest to the gender work in terms of code changes. Extra considerations include:

when normalizing data (from existing instances’ data and from database input) we want to make sure we reflect the terms used in the existing data, and commonly found in FOIA responses, so that eg White, Caucasian, European all get normalized to White (etc for other racial groups)
the lookups when filtering for officers will probably need to be a multi-select that includes all the races selected in the results. Selecting “not sure” along with one or more racial categories is meaningless (assuming we filter similarly to how we do for gender) so we need to figure out the best UI to represent this.
possibly have some UI so that users can also filter by race using a variety of terms for the same racial group

handling age

I don’t think this requires the same kind of DB normalization since it’s just a date of birth but we want to make the UX and filtering capabilities consistent with everything else.

handling rank

I don’t know exactly how this will work since it’s a separate table from Officer but I think other than that it’ll be pretty similar to race and gender in terms of implementation.

FOIA guidelines

Finally, we can try to avoid the missing data problems if people include all the categories of info needed in their FOIA requests — I know the data we get back is often incomplete but it’s probably good practice to ask for all the demographic info we want when making requests. I have no idea how the FOIA process works so I’ll want help on this.

lucyparsons / OpenOversight

Search doesn't include results with null fields #711

Make unified UX decisions + functions

better organize choices constants

Normalize race data in DB and forms

handling age

handling rank

FOIA guidelines