haroldthimbleby / improving-science

Paper, supplementary material and all data for a paper "Improving science that uses code"
2 stars 0 forks source link

All data, code, and other files for the paper "Improving science that uses code"

Harold Thimbleby

harold@thimbleby.net

README generated on 07 June 2023

(README.md is generated from README.md-src by running make readme)

Basics

A PDF of the typeset paper and appendix is available at http://www.harold.thimbleby.net/paper-seb.pdf. It's also available in this repository as all.pdf (or expanded-all.pdf if you ran make expand, as discussed below).

All data is defined and stored in human-readable JSON format in the file programs/data.js — though there is more in the directory models, which has downloaded many Git repos and analyzed them.

For convenience, a CSV file is included in this Git repository, in the directory generated. This may be more convenient to read than the JSON file.

Everything works using make.

If this all seems too complicated...

Run make expand, which will generate basic LaTeX files that depend on nothing other than their documentclass files. (make expand works by recursively importing all files, so that the expanded files need nothing else to be typeset.)

Directory structure

The top level working direction contains README (this file), the makefile, the two LaTeX files (paper.tex and appendix.tex) as well as all their usual stuff (.aux, .pdf, .bib files, and common macros, macros.tex, etc), two bibliographies (for each of the two LaTeX files), and several directories:

Overview

First, here's a quick overview of how the system works behind the scenes inside make:

To generate all data or typeset the paper, you will need a Unix system with: make, awk, bibtex, git, latex (or pdflatex etc, depending on how you configure make), node, sed, and zip (plus the usual echo, egrep, grep, rm, sh, test, etc). Mathematica is used, though all the files it generates are already in the Git repository (and will be copied to generated/*), so you can get away without being able to run Mathematica yourself — it's not open source. If you use make expand, you will also need diff-pdf.

The file programs/data.js includes both the JSON data and a JavaScript program that checks the data, analyses it, and generates most of the various data files — the CSV file, lots of LaTeX files used in the paper, and others.

Running node programs/data.js will give you

These generated files will all be included in the directory generated

data.js also lists all the files generated, where they are, and their purpose. (Note that there are some other generated files, for instance, those created by run run in the models directory, which analyzes all the git repos copied from papers in the survey.)

However, it's better to use make than do things piecemeal ...

Using make

Run make (with no parameters) to find out everything that you can do.

Here are all the available options:

The structure and use of the LaTeX files

The two main LaTeX files are paper.tex and appendix.tex

The supplementary material continues pagination, section, table, citation, and figure numbering across from the main file. In addition, the main and supplementary files cross-reference each other in the normal LaTeX ways — using \label{} and \ref{}. To make this work easily, each file explicitly reads in the other's .aux file (which is used by LaTeX for handling cross-references). Apart from that, nothing special has to be done for the files to communicate.

The bibliographies are a bit more complicated. paper.tex has a bibliography file, paper.bib, in the usual way; however the supplement has its own bibliography. In fact, it has two. One works in the standard BibTeX way; the other is generated by data.js and is the bibliography for the surveyed papers only. The survey bibliography is generated directly by data.js as a .bbl file, rather than as a BibTeX file.

Because the supplement reads the main file's .aux it automatically gets the bibliography details (names and numbers) used by the main file (and it does cite some stuff from it, so uses consistent numbering). A simple counter is then used to ensure the remaining new citations in the supplement carry on with the same numbering sequence.

The Mathematica notebook programs/check-xrefs.nb checks all the citations and cross-references, not least because BibTeX and LaTeX's errors and warnings are not up to having multiple files for cross-references and citations.

Further information

For help or further information, please email harold@thimbleby.net

Web site harold.thimbleby.net