luillo1 / RaceResults

A platform for managing and sharing running race results within running groups
4 stars 0 forks source link

Figure out how to get frequency of names #4

Open CarlKCarlK opened 2 years ago

CarlKCarlK commented 2 years ago

The MemberMatch feature needs, as input, the (approximate) frequency of each member's first name(s) and last names(s). Some alternatives:

CarlKCarlK commented 2 years ago

Good news -- I've found three sources of free information that I think will give us what we need:

@luillo1, I know little of databases but a lot about processing text CSV files, so how about I write a utility program that will merge these 100's of files into one reasonable CSV. It will have about a few 100K of rows and fewer than a dozen columns. In the short term, MemberMatch can use this one reasonable CSV. In the medium term, you can move it into a database if you want.

CarlKCarlK commented 2 years ago

I saw @satvu yesterday at ESR Track and she said she found these sources, too.

CarlKCarlK commented 2 years ago

I've created a tab-separated file that associates 250K names with their (approximate) probability.

@luillo1 & @MutatedGamer & @satvu The file is 5 meg. Is this OK to check in for now, or is it too big?

I imagine that you'll eventually want it converted to a serialized binary C# dictionary or a database table, but for now it would be useful to have it checked in. It would let me update the MemberMatch functions using this data.