ipno-llead / US-IPNO-exonerations

Processing repo for the Innocence Project New Orleans' Louisiana Law Enforcement Accountability Database
3 stars 1 forks source link

enhancement/classifier #6

Closed ayyubibrahimi closed 1 year ago

ayyubibrahimi commented 1 year ago

@tarakc02 @johnargentino @lantrinh181 I foolishly blanked (for way too long) on the fact that we're beginning with image classification, not text classification. I'll begin on the image classification task next week.

tarakc02 commented 1 year ago

sorry for my slowness right now as I'm on a family trip, but one quick note that we should probably add pycache to the .gitignore...

ayyubibrahimi commented 1 year ago

No rush at all. Had time on my hands after wrapping up work with the devs on the processing repo/will be busy this week with other stuff. Hope the trip is fun!

ayyubibrahimi commented 1 year ago

@tarakc02 this PR has been updated to contain the correct code for the heuristic task.

tarakc02 commented 1 year ago

thanks!

couple of quick questions/comments:

1) add __pycache__ to the gitignore so that those files aren't part of future commits 2) are we replicating work that's already being done in the thumbnailing task here? 3) do we need to be storing the full-size (300dpi) images for any reason? we need the image for the ocr, but otherwise? might not be a problem, but they do take up a lot of space and become a logistical issue to keep track of... 4) setup and execution of the heuristic part looks great 5) modeling code looks good, we'll want to hold on to this and compare to the CNN also. or... does this already perform well on its own?

ayyubibrahimi commented 1 year ago
  1. Done.
  2. There is some minor duplication. I'll re-factor the process_pdf func after the thumbnail task is integrated.
  3. Not at all. This code currently outputs the full-sized images for the boilerplate model. I'll re-factor this as well.
  4. Shukran.
  5. Performs well!!! (on my small sample size)