Flavio/issue 15 - Githubissues

f-hafner commented 5 months ago

Issue

Closes #15

Description of changes

added the drop-down menu to the gui
segment the story by sentence; created second dataframe with one row per sentence

Open questions

[x] the resulting dataframe is not outputted anywhere. one solution would be to have a data structure with a set of relational dataframes/tables, and this datastructure can then be used as input in downstream widgets.
[x] where to define the type of the comboBox? currently I have to transform from str to int in line 142 and line 147 of orangecontrib/storynavigation/widgets/OWSNTagger.py. Ideally we change the type when when the tagger is instantiated and the input value changes

The number of segments are computed with the np.array_split. Because the number of segments is now defined at a global level (for all stories), it can create highly unequal segment sizes when there is large variability in the length across stories (and thus, statistical conclusions from comparing segments within a story will be more or less accurate depending on the size of the segment). One way to deal with this is to represent this uncertainty to the user and write a clear documentation about it, perhaps including a hint that the user should inspect the segment length in their stories. Another way could be to let the user define, instead of (or in addition to?) the number of segments, the minimum segment size they want.

Includes

[X] Code changes
[ ] Tests
[x] Documentation

f-hafner commented 5 months ago

Instead of a new dataframe, store the segment_id in a new column in the dataframe with the tags.

f-hafner commented 5 months ago

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

kodymoodley commented 5 months ago

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

@f-hafner By 'output' do you mean the dataframe? And by 'differs' do you mean solely with the additional column indicating the story segment number? If so then there is no issue. Just to be clear, the intention (as per our offline discussion yesterday) is still to retain a single dataframe as the output for the tagger, right?

navigating-stories / orange-story-navigator

Flavio/issue 15 #28

Issue

Description of changes

Includes