The script scripts/06-gitsum-analysis.R was generating .tsv files that couldn't be combined together.
Reason: Some git commit messages contained newline characters, and the .tsv that were generated did not quote these characters. So when those newline-mangled .tsvs were read back in (during rowbind.tsv) they had mismatched column numbers. Some rows had 29, some had less.
Of note: readr::write_tsv() does not use the same default quote argument as readr::write_delim (tsv uses 'none', delim uses 'needed'). A change to the default 'quote' occurred with readr v2.0.0.
The entire workflow for calling {gitsum} functions has been moved into {codeAsData}. A test has been added to that package to ensure that the .tsv that is written out can also be read back in. The only relevant function is codeAsData::run_gitsum_workflow(). The 'quote' value is set in that function to 'needed', so the commit-messages should be stored correctly now.
Note that the snakemake pipeline runs to completion now.
The script
scripts/06-gitsum-analysis.R
was generating .tsv files that couldn't be combined together.Reason: Some git commit messages contained newline characters, and the .tsv that were generated did not quote these characters. So when those newline-mangled .tsvs were read back in (during rowbind.tsv) they had mismatched column numbers. Some rows had 29, some had less.
Of note:
readr::write_tsv()
does not use the same defaultquote
argument asreadr::write_delim
(tsv uses 'none', delim uses 'needed'). A change to the default 'quote' occurred with readr v2.0.0.The entire workflow for calling {gitsum} functions has been moved into {codeAsData}. A test has been added to that package to ensure that the .tsv that is written out can also be read back in. The only relevant function is
codeAsData::run_gitsum_workflow()
. The 'quote' value is set in that function to 'needed', so the commit-messages should be stored correctly now.Note that the snakemake pipeline runs to completion now.