rbind / njtierney.com

Nick Tierney's website
http://www.njtierney.com/
8 stars 4 forks source link

How to Get Good with R? | Credibly Curious #71

Open utterances-bot opened 11 months ago

utterances-bot commented 11 months ago

How to Get Good with R? | Credibly Curious

https://www.njtierney.com/post/2023/11/10/how-to-get-good-with-r/

kaueaguillera commented 11 months ago

Thank you for this :) I am a beginner in R, and its help me a lot!

Regards from 🇧🇷

njtierney commented 11 months ago

Glad to hear it! Let me know if there's anything you'd like to hear more about or what could be clearer :)

mpaulacaldas commented 11 months ago

I really liked this post! I think it's so important you mention debugging. It is definitely one of those skills that is not often picked-up on the go by self-taught R users (though more learning materias mention it now).

I am really looking forward to the second part two! Beyond typing speed, I've found that not being able to touch type can be a real barrier (e.g. they miss auto-complete prompts because they're looking at their keyboard -> are more likely to have typos, etc). I was really surprised when I moved to France to realise just how many people were not taught how to touch type at school, even in young generations...

njtierney commented 11 months ago

Thanks for the kind words! Great point about typing speed - I think it must first come from accuracy, so touch typing is necessary, in order to develop useful speed.

I'm currently chunking the second part of the blog post into smaller pieces, as it turned into a pretty big post and I was worried I'd never finish it all. So there should be one about typing and keyboard shortcuts soon enough :)

dhduncan commented 11 months ago

Thanks for sharing, Nick.

I'm a long time user who has rarely had the sense of having fledged beyond resources of Stack Overflow to get me around roadblocks and sticky puddles. Most problems posted in that and other platforms will typically attract solutions in base, tidyverse, and maybe data table and I for one have never really committed to one style. I switch between base and tidyverse style solutions, and - getting to my question now - do you think that as an extension of curating a consistent style to try and help reflex understanding of your own work, that one of the keys to success might be "joining a gang"?

njtierney commented 11 months ago

Thanks, @dhduncan !

There's a lot of really good solutions on stack overflow and co, for me, I find that sometimes the best solution is in base, and sometimes it's in data.table or tidyverse.

do you think that as an extension of curating a consistent style to try and help reflex understanding of your own work, that one of the keys to success might be "joining a gang"?

Yes, I do! But there are caveats. data.table is hands down the fastest way to do a lot data munging tasks and more in R. It's faster than python, it's just like, really good. Personally I prefer to use the tidyverse, as I find that for what I'm doing, I don't need to worry about the memory/time that data.table would solve. I personally find the data.table syntax too brief, and as a result harder to understand.

There's a balance with joining a team. You don't have to use only tidyverse or base, but using data.table in the middle of some tidyverse code might cause some friction. So I would say, pick tidyverse or data.table. Although you can do both with dtplyr - https://dtplyr.tidyverse.org/ - which allows you to write dplyr code and it uses data table as a backend for speed.

I wouldn't say that base solutions are mutually exclusive to either of these. But I think as you get more experience with these packages you will see places where staying in one group keeps the document in a consistent style.

Anyway that's a long winded way of saying, yes, I think it's a useful thing to stick with a consistent set of packages. In general I would avoid mixing up data.table code with dplyr/tidyverse code. They have different semantics, as shown below - tidyverse requires that you save the new data out to a new variable, base has a way to add new data, and data.table just writes the data without needing to create a new data frame. But I think base code can be mixed in to some extent.

data$x <- 1:10
data$y <- runif(10)
data_x_vars <- data %>% 
  mutate(
    x = 1:10,
    y = runif(10)
)

Data.table

dt[ , x := 1:10]
dt[ , y := runif(10)]