russHyde / code_as_data

Analysis of code in R dev packages (for a planned talk)
9 stars 0 forks source link

Move gitsum analysis to {codeAsData} #63

Closed russHyde closed 1 year ago

russHyde commented 1 year ago

The script scripts/06-gitsum-analysis.R was generating .tsv files that couldn't be combined together.

Reason: Some git commit messages contained newline characters, and the .tsv that were generated did not quote these characters. So when those newline-mangled .tsvs were read back in (during rowbind.tsv) they had mismatched column numbers. Some rows had 29, some had less.

Of note: readr::write_tsv() does not use the same default quote argument as readr::write_delim (tsv uses 'none', delim uses 'needed'). A change to the default 'quote' occurred with readr v2.0.0.

The entire workflow for calling {gitsum} functions has been moved into {codeAsData}. A test has been added to that package to ensure that the .tsv that is written out can also be read back in. The only relevant function is codeAsData::run_gitsum_workflow(). The 'quote' value is set in that function to 'needed', so the commit-messages should be stored correctly now.

Note that the snakemake pipeline runs to completion now.