luispintoc / Linking-Writing

https://www.kaggle.com/competitions/linking-writing-processes-to-writing-quality/overview
Apache License 2.0
0 stars 0 forks source link

TODO #28

Open luispintoc opened 5 months ago

luispintoc commented 5 months ago

https://people.eng.unimelb.edu.au/baileyj/papers/paper249-EDM.pdf

luispintoc commented 5 months ago

Check these features: https://github.com/luispintoc/Linking-Writing/issues/22

luispintoc commented 5 months ago

LUC:

Check bursts: https://people.eng.unimelb.edu.au/baileyj/papers/paper249-EDM.pdf

Check what pressed keys is

check process_variance

Take a look at deletions

IKI papers read https://www.researchgate.net/publication/325513604_Modeling_Basic_Writing_Processes_From_Keystroke_Logs https://files.eric.ed.gov/fulltext/ED592674.pdf

https://github.com/luispintoc/Linking-Writing/issues/22

lucselmes commented 5 months ago

Checking bursts:

P-burst:

R-burst:

Between P-burst and R-burst, we apply a condition to selecting entries for P-burst, but don't apply any conditions before selecting entries for R-burst. This means that we necessarily have entries used in calculating R-burst features that are also used when calculating P-burst features, from what I understand. Is this an issue? Because my understanding of both types of bursts is that they are mutually exclusive.

lucselmes commented 5 months ago

Pressed keys

These features don't seem problematic. But I don't get the first one, but why not

lucselmes commented 5 months ago

Process variance

This aims to look at the distribution of actions over the total time of the essay, i.e., how much is created at what points in the total time allotted for writing the essay, e.g., cramming in the last 5mins, consistent writing throughout, etc.... Aims to show how regular the writing process was in terms of the quantity of output. Implementation seems correct.

lucselmes commented 5 months ago

Deletions

Potential problem: deletion_length in df['text_change'].str.len() will always be equal to 1 since there is the condition that only entries where df['text_change']=='q' are included. --> Is this fine since we only want entries where the change is of 1 character? - need to discuss this to clarify.

Besides this, there is no issue with the logic.

lucselmes commented 5 months ago

IKIs

The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. So would calculate how long it takes for a user to write a sentence (If splitting on down_event == line terminating punctuation), etc... depending on the aggregation used on the back end.

Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI

lucselmes commented 5 months ago

IKIs

The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. It calculates aggregations, so for the avg it calcs: "how frequently does the user hit the space button", etc...

Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI

Didn't actually test the implementation, but it should work 👯

lucselmes commented 5 months ago

IKIs

The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. It calculates aggregations, so for the avg it calcs: "how frequently does the user hit the space button", etc...

Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI

Also, the current implementation for IKI_sentence doesn't take into account the fact that a sentence may end in a ! or ?, so just quickly adding that too.

lucselmes commented 5 months ago

Pressed keys

  • product_to_keys(args) --> Returns the ratio between the length of the essay and the amount of input or remove/cut actions during the writing of an essay (This amounts to basically avg chars generated per input and remove/cut action?)
  • get_keys_pressed_per_second(args) --> amount of input + remove/cut actions performed over the course of the essay divided by the total number of seconds in the essay. This amounts to the how many input + remove/cut actions were performed per second.

These features don't seem problematic. But I don't get the first one, but why not

  • Could you think of ways to expand on these ideas?

@lucselmes

lucselmes commented 5 months ago

Deletions

Potential problem: deletion_length in df['text_change'].str.len() will always be equal to 1 since there is the condition that only entries where df['text_change']=='q' are included. --> Is this fine since we only want entries where the change is of 1 character? - need to discuss this to clarify.

Besides this, there is no issue with the logic.

@luispintoc