Open luispintoc opened 5 months ago
Check these features: https://github.com/luispintoc/Linking-Writing/issues/22
LUC:
Check bursts: https://people.eng.unimelb.edu.au/baileyj/papers/paper249-EDM.pdf
Check what pressed keys is
check process_variance
Take a look at deletions
IKI papers read https://www.researchgate.net/publication/325513604_Modeling_Basic_Writing_Processes_From_Keystroke_Logs https://files.eric.ed.gov/fulltext/ED592674.pdf
P-burst:
R-burst:
Between P-burst and R-burst, we apply a condition to selecting entries for P-burst, but don't apply any conditions before selecting entries for R-burst. This means that we necessarily have entries used in calculating R-burst features that are also used when calculating P-burst features, from what I understand. Is this an issue? Because my understanding of both types of bursts is that they are mutually exclusive.
These features don't seem problematic. But I don't get the first one, but why not
This aims to look at the distribution of actions over the total time of the essay, i.e., how much is created at what points in the total time allotted for writing the essay, e.g., cramming in the last 5mins, consistent writing throughout, etc.... Aims to show how regular the writing process was in terms of the quantity of output. Implementation seems correct.
Potential problem: deletion_length in df['text_change'].str.len() will always be equal to 1 since there is the condition that only entries where df['text_change']=='q' are included. --> Is this fine since we only want entries where the change is of 1 character? - need to discuss this to clarify.
Besides this, there is no issue with the logic.
The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. So would calculate how long it takes for a user to write a sentence (If splitting on down_event == line terminating punctuation), etc... depending on the aggregation used on the back end.
Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI
IKIs
The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. It calculates aggregations, so for the avg it calcs: "how frequently does the user hit the space button", etc...
Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI
Didn't actually test the implementation, but it should work 👯
IKIs
The inter-keystroke interval refers to the time intervals between consecutive keystrokes when a person is typing. The current implementation of the IKI, where we specify a "down_event", actually calculates the interval actions of the same type, e.g., how long has it been since the user hit space. It calculates aggregations, so for the avg it calcs: "how frequently does the user hit the space button", etc...
Added a new implementation that calculates interval between every event. This may take a while or whatever, but from my understanding that is the definition of IKI
Also, the current implementation for IKI_sentence doesn't take into account the fact that a sentence may end in a ! or ?, so just quickly adding that too.
Pressed keys
- product_to_keys(args) --> Returns the ratio between the length of the essay and the amount of input or remove/cut actions during the writing of an essay (This amounts to basically avg chars generated per input and remove/cut action?)
- get_keys_pressed_per_second(args) --> amount of input + remove/cut actions performed over the course of the essay divided by the total number of seconds in the essay. This amounts to the how many input + remove/cut actions were performed per second.
These features don't seem problematic. But I don't get the first one, but why not
- Could you think of ways to expand on these ideas?
@lucselmes
Deletions
Potential problem: deletion_length in df['text_change'].str.len() will always be equal to 1 since there is the condition that only entries where df['text_change']=='q' are included. --> Is this fine since we only want entries where the change is of 1 character? - need to discuss this to clarify.
Besides this, there is no issue with the logic.
@luispintoc
https://people.eng.unimelb.edu.au/baileyj/papers/paper249-EDM.pdf
[x] Check what pressed keys is
[x] check process_variance
[ ] expand on cursor visits
[x] do aggregations in "segments visits"
[ ] expand on relative size paragraph
[x] Punctuation per intro/body/conclusion
[x] Add aggs to time features
[x] Take a look at deletions
[x] IKI papers read