initial code to import and process handlabeled examples and compare t…

tarakc02 commented 1 year ago

…o model outputs, calcs precision and recall by parameter settings

tarakc02 commented 1 year ago

notes:

does this hacky symlink setup step because make kind of breaks down in the face of spaces in dependency file names. the symlinks preserve modified date, so the downstream task is set up to only update on new data. because everything is by file, you can use the -j argument for make to run process all files in parallel. For example make -j 12 will spin up 12 processes
currently combines results from all prompts for each document. from discussion, we'd like also to output statistics by prompt, in order to further investigate how prompt variation vs. stochasticity affect recall.

ayyubibrahimi commented 1 year ago

@tarakc02 Sorry! Just turned off the branch protection rule that required a review before merging.

I've produced some data for the second piece of analysis. The parameters for the tables are:

k = 20
chunk_size = 500
chunk_overlap = 250
hyde = 1 or 0

I chose three queries: 1) Identify individuals, by name, with the specific titles of officers, sergeants, lieutenants, captains, detectives, homicide officers, and crime lab personnel in the transcript. Specifically, provide the context of their mention related to key events in the case, if available. 2) List individuals, by name, directly titled as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel mentioned in the transcript. Provide the context of their mention in terms of any significant decisions they made or actions they took. 3) Locate individuals, by name, directly referred to as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel in the transcript. Explain the context of their mention in relation to their interactions with other individuals in the case.

And for each document, I ran the query 6 consecutive times both with HYDE and without HYDE. Because I figured that we should be able to infer the effect with a subset of documents, I chose to produce these data for a smaller subset of documents (3 police reports/3 transcripts). Let me know if I should point you to them or if you want me to do the analysis.

ayyubibrahimi commented 1 year ago

I've uploaded data for the remaining three queries. All other parameters from above remain the same. Data can be found here.

ayyubibrahimi commented 1 year ago

I've re-run an analysis similar to the one described above for with GPT4. I chose different but similar parameters to those described above because GPT-4 has a token size 2x greater than GPT3. Based on the initial analysis that you did, these are presumably optimal for name extraction. The intent behind running this analysis with GPT4 is still to determine why we're seeing such variability in what names are extracted.

The parameters for the tables are:

model = GPT4 k = 15 chunk_size = 1000 chunk_overlap = 500 hyde = 1 or 0

I chose two queries:

Identify individuals, by name, with the specific titles of officers, sergeants, lieutenants, captains, detectives, homicide officers, and crime lab personnel in the transcript. Specifically, provide the context of their mention related to key events in the case, if available.
List individuals, by name, directly titled as officers, sergeants, lieutenants, captains, detectives, homicide units, and crime lab personnel mentioned in the transcript. Provide the context of their mention in terms of any significant decisions they made or actions they took.

And for each document, I ran the query 6 consecutive times both with HYDE and without HYDE on 3 police reports and 3 transcripts. They can be found here.

ipno-llead / US-IPNO-exonerations

initial code to import and process handlabeled examples and compare t… #10