Notes from the conversation:

How do we deal with PII in transcripts:

Have people promise that they won't share pii, or if they do they know it will be in transcripts?
Do named entity recognition, do a hash of the name or location when it shows up in transcripts

Dealing with PII in videos:

Check if there are more than one person in the frame and blank the frame

Data features fall at different levels:

Millisecond level raw audio and video features- where is person's head pointing
Utterance data, transcript level data
Survey level data

Github will hold the utterance level and survey level data, some of the high-frequency data is large and would either want to go on LFS or stay in S3. Most of the time we'll use the milisecond data to figure out higher-level features that are not so frequent.

Todo:

[x] Get videos from our pilots moved to a dedicated S3 bucket. @JamesPHoughton @RachelAbigail
- need to use the metadata from the pilots to identify which folders need to move
- [x] Get a few videos into this bucket quickly so that we can get some analysis started
[ ] Documentation about what the videos are and where they come from @JamesPHoughton @RachelAbigail
[x] Set up user accounts to share S3 bucket with the group
[ ] Put together a dirty run of the workflow @ChristopherLucas @dcknox
[ ] Look at the existing survey and metadata from https://github.com/willschulz/bad-influence/tree/main/analysis @willschulz
[ ] Watch the videos to get a qualitative sense of what is going on @JamesPHoughton @dcknox @RachelAbigail @xehu

willschulz / bad-influence

Conversation Sept 13 #30

Notes from the conversation:

How do we deal with PII in transcripts:

Dealing with PII in videos:

Data features fall at different levels:

Todo: