piternicolaas / Text-Mining-Project

0 stars 0 forks source link

Project Update 3 #5

Open caspardj23 opened 2 months ago

caspardj23 commented 2 months ago

DONE:

TODO:

Seb-Olsen commented 2 months ago

Figure_1 Testing our method for climax recognition - contrary to what we expected, climax seems to be within the first 25% of episode or right at the end. This method uses character name detection, under the assumption that the climax is the scene where the most names are mentioned (code below). More fine-tuning is necessary before we have a fully working climax detection algorithm.


  def climax_scene(scenes):
      scenes_counts = []
      for scene in scenes:
          charactercount = 0
          for w in scene:
              if friendsname(w) and len(w) >= 4:
                  charactercount += 1
          scenes_counts.append(charactercount)

      max_index = 0
      max_count = 0
      for idx,  count in enumerate(scenes_counts):
          if count > max_count:
              max_index = idx
              max_count = count
      return scenes[max_index], max_index

  def friendsname(w):
      friends = [
          "Phoebe",
          "Chandler",
          "Ross",
          "Monica",
          "Rachel",
          "Joey",
          "Geller",
          "Pheebs",
          "Bing",
          "Tribbiani",
          "Buffay",
      ]
      for friend in friends:
          if w.startswith(friend):
              return True
      return False
Seb-Olsen commented 2 months ago

Figure_2 Updated to account for length of scene - we now look for concentration of names rather than amount