The following are Riccardo's notes and the lines marked with an "X" have been updated in this feature branch. Not on this list, but now updated, is that I removed Tasa from the spell-check training corpus to just use Gutenberg texts. I have updated the Notebook and comments accordingly.
This function: model_store = prepare_transcripts(...
[X] 1.1. This code "hard-codes" the example directory instead of using the parameters above (here modified to match my example)
[ ] 1.2. The same code could also provide (in case of failure) a reminder of the specifics required for the dataset (participant, tab, content).
[ ] 1.3. One could also allow for the colnames and delimiter to be specified as parameters.
[ ] 1.4. For big corpora it might be handy to turn off the output printed out (e.g. via a parameter)
This function: [turn_real,convo_real]= calculate_alignment(..
[X] 2.1. same as 1.1.
[X] 2.2. the link to the align_concatenated_dataframe.txt file is broken, since that file is saved within the example folder and not the main one.
[ ] 2.3. I get a warning: /Users/au209589/anaconda3/envs/ipykernel_py2/lib/python2.7/site-packages/scipy/spatial/distance.py:644: RuntimeWarning: invalid value encountered in double_scalars
dist = 1.0 - uv / np.sqrt(uu * vv)
[X] 2.4. in a file I had a child only using one-word sentences and when the file is cleaned, this leads to an error of the function (finding only one interlocutor) in this line:
df_B = dataframe.loc[dataframe['participant'] == dataframe['participant'].unique()[1]]
We might put an exception catch there and a meaningful warning to the user. FIX: if len(dataframe) > 0: And it can be replace with if len(dataframe) > 1: # or higher if one thinks it should be higher.
[ ] 2.5. What about putting a parameter specifying the minimum number of turns per speaker that have to be present for the analysis to run on that file?
This function [turn_surrogate,convo_surrogate] = calculate_baseline_alignment(...
[X] 3.1. same as 1.1.
[ ] 3.2. should we add the dyad/condition labels as parameters at the beginning?
[X] 3.3. cond and dyad should be better mentioned in the paper as requisites for the filenames > NOW HIGHLIGHT THIS IN THE README FOR THE NOTEBOOK
The following are Riccardo's notes and the lines marked with an "X" have been updated in this feature branch. Not on this list, but now updated, is that I removed Tasa from the spell-check training corpus to just use Gutenberg texts. I have updated the Notebook and comments accordingly.
This function: model_store = prepare_transcripts(... [X] 1.1. This code "hard-codes" the example directory instead of using the parameters above (here modified to match my example) [ ] 1.2. The same code could also provide (in case of failure) a reminder of the specifics required for the dataset (participant, tab, content). [ ] 1.3. One could also allow for the colnames and delimiter to be specified as parameters. [ ] 1.4. For big corpora it might be handy to turn off the output printed out (e.g. via a parameter)
This function: [turn_real,convo_real]= calculate_alignment(.. [X] 2.1. same as 1.1. [X] 2.2. the link to the align_concatenated_dataframe.txt file is broken, since that file is saved within the example folder and not the main one. [ ] 2.3. I get a warning: /Users/au209589/anaconda3/envs/ipykernel_py2/lib/python2.7/site-packages/scipy/spatial/distance.py:644: RuntimeWarning: invalid value encountered in double_scalars dist = 1.0 - uv / np.sqrt(uu * vv) [X] 2.4. in a file I had a child only using one-word sentences and when the file is cleaned, this leads to an error of the function (finding only one interlocutor) in this line: df_B = dataframe.loc[dataframe['participant'] == dataframe['participant'].unique()[1]] We might put an exception catch there and a meaningful warning to the user. FIX: if len(dataframe) > 0: And it can be replace with if len(dataframe) > 1: # or higher if one thinks it should be higher. [ ] 2.5. What about putting a parameter specifying the minimum number of turns per speaker that have to be present for the analysis to run on that file?
This function [turn_surrogate,convo_surrogate] = calculate_baseline_alignment(... [X] 3.1. same as 1.1. [ ] 3.2. should we add the dyad/condition labels as parameters at the beginning? [X] 3.3. cond and dyad should be better mentioned in the paper as requisites for the filenames > NOW HIGHLIGHT THIS IN THE README FOR THE NOTEBOOK