sfirke / janitor

simple tools for data cleaning in R
http://sfirke.github.io/janitor/
Other
1.39k stars 133 forks source link

Lapply not finding second column in two-way tabyl #431

Closed StenDieden closed 2 years ago

StenDieden commented 3 years ago

Greetings! Janitor is an excellent package, especially for a newbie contingrency table freak coming out of Stata. I've come across what appears a peculiarity. tabyl(df, varname1) works perfectly in my lapply code, but tabyl(df, varname1, varname 2) in the otherwise identical code crashes.

I'm pulling out 1 out of 10 identified columns in a (larger) df, intending to cross tabulate these with an 11th column from the same df. I intend to bind each of these cross tabulations these into a grander table (and later transpose in Excel). Hence , tabyl in a lapply FUN is convenient way to build the smaller dfs to bind.

Now, as a useful digression, let's note that the command tabyl(df, 7) works great for the 7th column of my df , but tabyl(df, 7, 8 ) returns an error message, since it looks for an eighth column in the three column output of tabyl(df, 7). To circumvent the position issuel, I run the lapply over the actual column names, which all start with "B4_0". It works as a dream for one-way tables with:

z <- list(colnames(df(grep("B4_", colnames(df))))) lapply(seq_along(z), function(i){ df_i <- tabyl(dfl, i)
} ) -> list_t4 names(list_t4) <- str_glue("df4_0{z}") list2env(list_t4, .GlobalEnv)

However, if I change the tabyl command into a two-way command with the column called B3 (with certainty the sixth column in the same df), with:

df_i <- tabyl(df, i, B3) I get "* Column i is not found.",

Using a slightly different syntax

df_i <- tabyl(dfcel,i,as.name(B3)) I get "Error: object 'B3' not found."

Chances are high that I'm not subsetting the second variable/collumn in the tabyl command, but the correct way is not super obvious.

I'd be very thankful for some guidance. Thanks very much for your time this far.

All the best, Sten

PS I had a great time visiting the ISR and UMich for a few months during the winter of 1997. As a Swede deprived of ice hockey in South Africa, It was a pleasure to watch Bubba Berenzweig dominate the college hockey back then, but unfortunately his NHL career never took off.

sfirke commented 2 years ago

Hi Sten! Thanks for this nice note and sorry for the lack of response. I enjoy the aspect of open source software where you interact with people from around the world. Cool that you got to watch UMich hockey back in '97. I was a big college hockey fan as a Cornell undergrad and will share the fun fact that the currently-enthusiastic Michigan hockey cheering section got its inspiration and cheers from Cornell in a fateful 1991 meeting where the Wolverines got out-cheered: https://www.michigandaily.com/uncategorized/student-section/

Last year I got to watch Cornell whip Michigan State in East Lansing, my last college hockey game before COVID.

To your actual question, this might help: https://stackoverflow.com/questions/65835411/tabulate-one-variable-in-a-data-frame-against-all-others-using-janitortabyl

I never use lapply, having converted entirely to purrr::map, but here someone found a clever way to use lapply with janitor:::tabyl_2way: https://stackoverflow.com/questions/54377189/tidyverse-cross-tables-of-one-variable-with-all-other-variables-in-data-frame.

If anyone has future questions like this, feel free to use the Discussions feature of this repo.