jmboehm / Douglass.jl

Stata-like toolkit for data wrangling on Julia DataFrames
Other
51 stars 3 forks source link
data data-frames economics julia stata tabular-data

Douglass.jl

Lifecycle example branch parameter codecov.io

Douglass.jl is a package for manipulating DataFrames in Julia using a syntax that is very similar to Stata.

Note: Douglass.jl is in alpha, and may contain bugs. Please do try it out and report your experience. When using it in production, please check that the output is correct.

Installation

Douglass is not registered. To install, type ] in the Julia command prompt, followed by

add https://github.com/jmboehm/Douglass.jl.git

Examples

using Douglass, RDatasets, DataFrames, DataFramesMeta
df = dataset("datasets", "iris")
# set the active DataFrame
Douglass.set_active_df(:df)

# create a variable `z` that is the sum of `SepalLength` and `SepalWidth`, for each row
d"gen :z = :SepalLength + :SepalWidth"
# replace `z` by the row index for the first 10 observations
d"replace :z = _n if _n <= 10"
# drop a variable
d"drop :z"
# construct the within-group sum for a subset of the observations
d"bysort :Species : egen :z = sum(:SepalLength) if :SepalWidth .> 3.0"

Commands implemented

See the commands documentation page for more details on syntax of these commands.

REPL mode

Press the backtick (`) to switch between the normal Julia REPL and the Douglass REPL mode:

REPL Screenshot

Multiline and operations on a particular DataFrame

Douglass supports multiline input on the active dataframe:

d"""
gen :x = 5
gen :y = 6
"""

The @douglass macro allows subsequent operations to be performed on one particular DataFrame:

using RDatasets
iris = dataset("datasets", "iris")
Douglass.@douglass iris """
gen :x = :SepalWidth + :PetalWidth
gen :y = 42
"""

Benchmarks

benchmark

These benchmarks are made using a synthetic dataset with 1m observations, on my Macbook Pro (Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz, Julia 1.9.0, Stata/MP 17.0).

Notes

Bug reports

Please file bug reports as issues.

Roadmap / Todo's

If you find the package useful or the idea promising, please consider giving it a star (at the top of the page).

Related Packages

Misc

Douglass.jl is named in honour of the economic historian Douglass North.