Closed zkokaja closed 1 year ago
Is there a consensus that this should be removed? @VeritasJoker what did you do in the latest run? Did you keep this conversation or hard-code it to remove?
I'm not quite sure if we are removing it right now. I can check.
There is one with 8 words right now. After trimming for 2 seconds, it has 2 words. There are also other short conversations with like 14 words (12 after trimming).
I don't know if we still want to remove it if in the future we are not trimming anymore (after the 798 discussion).
What do you mean by 'trimming for 2 seconds'? Okay, let's hold this conversation later then.
this doesn't seem to be breaking the code, we should keep these conversations and decide on the analysis level whether we should include it or not.
seq2seq models may need to exclude these conversations in pickling by checking if a conversation has at least 2 speakers.
Here are the conversations with least amount of words for 676:
14 NY676_618_Part4_conversation2/misc/NY676_618_Part4_conversation2_datum_trimmed.txt 26 NY676_619_Part4_conversation2/misc/NY676_619_Part4_conversation2_datum_trimmed.txt 31 NY676_618_Part4_conversation9/misc/NY676_618_Part4_conversation9_datum_trimmed.txt 47 NY676_618_Part8_conversation2/misc/NY676_618_Part8_conversation2_datum_trimmed.txt 55 NY676_620_Part2_conversation1/misc/NY676_620_Part2_conversation1_datum_trimmed.txt 74 NY676_617_Part0_conversation1/misc/NY676_617_Part0_conversation1_datum_trimmed.txt 84 NY676_617_Part1_conversation5/misc/NY676_617_Part1_conversation5_datum_trimmed.txt 86 NY676_618_Part4_conversation10/misc/NY676_618_Part4_conversation10_datum_trimmed.txt 99 NY676_618_Part8_conversation1/misc/NY676_618_Part8_conversation1_datum_trimmed.txt
remove it from data and re-create pickles. which conversation is it?