Closed baileyb0t closed 3 months ago
Yes! This is awesome.
There's one other characteristic that I'm interested in: are there any noticeable patterns in entities that aren't correctly identified?
I definitely sent you an outdated version of the tables where the false positives weren't correctly calculated. Sending an email now with updated tables.
I still need to add review of the unmatched names, but I corrected the csv files and false positive comments in the notebook. I also added a reference to one of the prompts used in the model script to tie in auditing but I'll leave it to you to tie in the processing repo!
I couldn't find "James Lopuis" or "Woodall" (top unmatched names) in the Exhibit doc. @ayyubibrahimi Do you have an idea of where these names appear in the PDF? I found a few of the matches so it does appear to be the correct file.
I found the top unmatched names in the other PDF and included snippets and a couple comments about those.
"Woodall" is on pg22.
"James Lopuis" is supposed to be "James Dupuis". I'll fix that typo in the groundtruth table and re-run the pipeline.
Is "james dneps" supposed to be in the exhibit doc? Also, "James Dupuis" and "James Ducos" are both mentioned as being a photographer. Is that correct? Just want to make sure that is not a typo in the original document before I mention these two.
"James Dupuis" and "James Ducos" are both crime lab photographers. "Dneps" is a type/shouldn't be in the gt table. I've just sent the new output of the evaluation step where "Dneps" has been removed.
There is a merge conflict for the overview notebook but I did a pull before writing in it, so there shouldn't be any meaningful changes lost!
I put this first draft in a separate notebook but can definitely add the cells over to the existing notebook if it looks good.