theonaunheim / surgeo

Open Source Proxy Demographic module written in Python
MIT License
32 stars 16 forks source link

Implement BIFSG and FirstName models; increment version to 1.1.0 #13

Closed TheCleric closed 3 years ago

TheCleric commented 3 years ago

Will implement #12

In this PR the following have been done:

Please let me know of any questions or comments you may have. I will likely be writing up an article on my usage of this new functionality soon to identify racial disparities in a dataset I was analyzing. I'd like to link back to this wonderful library for others to replicate my results. :)

theonaunheim commented 3 years ago

Well that escalated quickly ... :)

Thanks for your efforts. Will review presently and touch base in the next couple days.

Quick administrative ask: can you confirm that you are releasing your code under the MIT license? I am OK with you keeping copyright to your code and will document accordingly in license/documentation/PyPI.

TheCleric commented 3 years ago

Yes absolutely MIT is fine. :)

theonaunheim commented 3 years ago

@TheCleric , good afternoon.

Wrapping up review. This is dynamite--thank you for adding your code and fixing mine at the same time. Question:

Within test_cli.py, you enhanced the functionality of the _compare() method by allowing it to tack keyword arguments onto the end of the subprocess args:

subprocess_commands.extend([command for kvp in [[f'--{key}', kwargs[key]] for key in kwargs.keys()] for command in kvp])

This is brilliant and super-helpful, but I can see my future self staring at this code slack-jawed trying to remember what it does. Can you confirm the dumbed-down version is functionally equivalent?

            # Convert kwarg key, value pairs to optional arguments e.g. ['--surname_column', 'custom_surname_col_name']
            argument_pairs = [
                [f'--{key}', value]
                for key, value
                in kwargs.items()
            ]
            # Add them to the subprocess arguments
            for argument_pair in argument_pairs:
                subprocess_commands.extend(argument_pair)
TheCleric commented 3 years ago

Yes, that's probably easier to understand for future devs. πŸ˜ƒ

Oh, and good job using .items() instead. Always forget about that function. πŸ˜†

theonaunheim commented 3 years ago

Evening, @TheCleric .

I'm tapping out for today, so I wanted to give you a status update.

Everything seems to be working. At your convenience would you please clone my surgeo/dev repo, install, and run the unittests on your machine? Comments on changes are also welcomed if you're so inclined.

Working plan as of right now:

  1. Do some manual user acceptance testing on GUI/CLI in case I missed anything;
  2. Make sure my addition to default arguments in the CLI module didn't break things;
  3. Understand the journal article and run BIFSG data up against the example in the journal article to make sure we're consistent;
  4. Accept your pull request;
  5. Merge my changes from my dev branch;
  6. Upload package to PyPI;
  7. Create a Github release; and,
  8. Create/upload binary installer Windows installer to Github.
TheCleric commented 3 years ago

All unit tests pass for me on your dev branch.

Only question/comment on your commits: Was the addition of files.txt with your local paths on purpose?

Otherwise looks good!

theonaunheim commented 3 years ago

Defenitely an error. Good catch, thanks.

When I get home from work I'll start to burn through the remaining items on this list.

TheCleric commented 3 years ago

One other thing I think I made a mistake on is what it does if it doesn't find a first name. As of right now it looks like it returns NaN, but it should probably revert to the ALL OTHER FIRST NAMES entry for its probabilities.

TheCleric commented 3 years ago

Looks like it could affect surnames as well. Hope to have a fix in this PR shortly.

theonaunheim commented 3 years ago

@TheCleric , do you think you will have further changes for v.1.1.0? If not I will go merge and close #12 .

If you want me to hold off that's equally fine--just wanted to make sure you weren't waiting on me.

TheCleric commented 3 years ago

No, I think this is done on my end!