hassonlab / 247-encoding

Contains python scripts for performing encoding on 247 data.
0 stars 9 forks source link

remove 676 conversation with only 5-6 words #29

Closed zkokaja closed 1 year ago

zkokaja commented 2 years ago

remove it from data and re-create pickles. which conversation is it?

hvgazula commented 2 years ago

Is there a consensus that this should be removed? @VeritasJoker what did you do in the latest run? Did you keep this conversation or hard-code it to remove?

VeritasJoker commented 2 years ago

I'm not quite sure if we are removing it right now. I can check.

VeritasJoker commented 2 years ago

There is one with 8 words right now. After trimming for 2 seconds, it has 2 words. There are also other short conversations with like 14 words (12 after trimming).

I don't know if we still want to remove it if in the future we are not trimming anymore (after the 798 discussion).

hvgazula commented 2 years ago

What do you mean by 'trimming for 2 seconds'? Okay, let's hold this conversation later then.

zkokaja commented 1 year ago

this doesn't seem to be breaking the code, we should keep these conversations and decide on the analysis level whether we should include it or not.

seq2seq models may need to exclude these conversations in pickling by checking if a conversation has at least 2 speakers.

zkokaja commented 1 year ago

Here are the conversations with least amount of words for 676:

14 NY676_618_Part4_conversation2/misc/NY676_618_Part4_conversation2_datum_trimmed.txt 26 NY676_619_Part4_conversation2/misc/NY676_619_Part4_conversation2_datum_trimmed.txt 31 NY676_618_Part4_conversation9/misc/NY676_618_Part4_conversation9_datum_trimmed.txt 47 NY676_618_Part8_conversation2/misc/NY676_618_Part8_conversation2_datum_trimmed.txt 55 NY676_620_Part2_conversation1/misc/NY676_620_Part2_conversation1_datum_trimmed.txt 74 NY676_617_Part0_conversation1/misc/NY676_617_Part0_conversation1_datum_trimmed.txt 84 NY676_617_Part1_conversation5/misc/NY676_617_Part1_conversation5_datum_trimmed.txt 86 NY676_618_Part4_conversation10/misc/NY676_618_Part4_conversation10_datum_trimmed.txt 99 NY676_618_Part8_conversation1/misc/NY676_618_Part8_conversation1_datum_trimmed.txt