Project Update 3 - Githubissues

caspardj23 commented 2 months ago

DONE:

Created preprocessing functions for the script data (Seb & Caspar)
Created code for climax/resolution recognition based on name count in scene (Piter & Seb)
Wrote introduction and started on research context (Caspar)

TODO:

Test our method for climax/resolution recognition
Design function to summarise until cutoff point in script
Continue writing on the report

Seb-Olsen commented 2 months ago

Figure_1 Testing our method for climax recognition - contrary to what we expected, climax seems to be within the first 25% of episode or right at the end. This method uses character name detection, under the assumption that the climax is the scene where the most names are mentioned (code below). More fine-tuning is necessary before we have a fully working climax detection algorithm.


  def climax_scene(scenes):
      scenes_counts = []
      for scene in scenes:
          charactercount = 0
          for w in scene:
              if friendsname(w) and len(w) >= 4:
                  charactercount += 1
          scenes_counts.append(charactercount)

      max_index = 0
      max_count = 0
      for idx,  count in enumerate(scenes_counts):
          if count > max_count:
              max_index = idx
              max_count = count
      return scenes[max_index], max_index

  def friendsname(w):
      friends = [
          "Phoebe",
          "Chandler",
          "Ross",
          "Monica",
          "Rachel",
          "Joey",
          "Geller",
          "Pheebs",
          "Bing",
          "Tribbiani",
          "Buffay",
      ]
      for friend in friends:
          if w.startswith(friend):
              return True
      return False

Seb-Olsen commented 2 months ago

Figure_2 Updated to account for length of scene - we now look for concentration of names rather than amount

piternicolaas / Text-Mining-Project

Project Update 3 #5