ipno-llead / US-IPNO-exonerations

Processing repo for the Innocence Project New Orleans' Louisiana Law Enforcement Accountability Database
3 stars 1 forks source link

Create file identifiers #1

Closed tarakc02 closed 1 year ago

tarakc02 commented 1 year ago

using sha1 hash. since there's so much data, this will take a few minutes to run. it would be nice to not have to rehash unchanged files if/when there are new files and we have to update the task.

ayyubibrahimi commented 1 year ago

How should we begin this step? I anticipate being set-up/having had figured out how to work remotely by tomorrow now that I'm up on Eleanor with the ability to pull data.

tarakc02 commented 1 year ago

we can focus for now on the basic requirement, and ignore the note about "it would be nice ...". So basically we want to output a dataframe with three columns:

Since version 3.11, python's hashlib includes a convenient file_digest method. On eleanor, you should be able to activate the ipno-exonerations conda environment, which has python3.11 set up. Alternatively, this stack overflow page looks like it has examples.