navigating-stories / orange-story-navigator

Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories
https://research-software-directory.org/projects/navigating-stories
Other
2 stars 2 forks source link

Setting analysis widget #14

Open kodymoodley opened 7 months ago

kodymoodley commented 7 months ago

Implement one feature for analysing the setting of a story:

f-hafner commented 7 months ago

We defined the following subtasks:

f-hafner commented 7 months ago

Questions to discuss

kodymoodley commented 6 months ago

We defined the following subtasks:

  • [x] start from corpus of stories
  • [x] remove stopwords
  • [x] lemmatize
  • [x] put into a dataframe together with storyid and segment id
  • [ ] prepare embeddings: @kodymoodley finds out which model to use
  • [ ] extract similar words in embedding space

Thanks very much @f-hafner ! This is already super helpful to have completed the preprocessing. The lead applicants have recently informed me that they would like to pause on the Setting widget until after the workshop. So this feature is no longer required for the workshop in April. But I / we could resume where you left off after the workshop.

kodymoodley commented 6 months ago

Questions to discuss

  • I am reusing the spacy model loaded for other tasks. is this ok here?

    • for instance, the "merge_noun_chunks" is added to the nlp model. Then, "Mijn eerste vriendje" becomes ["mijn een vriendje"]; if this is not added, we have ["mijn", "een", "vriendje"]
  • refactoring

    • structure between tagger and setting analyzer are now quite similar, maybe we can think of combining them?
    • test the function util.is_valid_token(); reuse in tagging.py

@f-hafner, will revisit this comment in April / May. Right now, I suspect that merging the noun chunkswould not be necessary for what we want to do.