hadley / r4ds

R for data science: a book
http://r4ds.hadley.nz
Other
4.51k stars 4.19k forks source link

I guess there is a problem with rolling joins bit #1642

Closed kayagorur closed 4 months ago

kayagorur commented 5 months ago

Hello, I am reading R4DS second edition and learning a great deal from it. I am grateful that you made it available online since I cannot afford it at least for now.

At section 19.5.3 Rolling Joins, figure 19.16, the dots do not match the equation "closest ( key <= key )". They are true for the opposite "closest ( key >= key )"

I am new to this stuff and of course I might be wrong but would appreciate if you just check it out once more. Since your book is such a valuable source for guys out of this field and trying to figure out data science by themselves I just wanted to share an issue that confused me.

P.S. I have created a GitHub account just to write this message to you :)

With respect

Rolling_join_issue

kayagorur commented 5 months ago

And one more thing that you may want to consider while you are at it. The same goes for the birthday parties example:

And for each employee we want to find the first party date that comes after (or on) their birthday. We can express that with a rolling join:

I guess to find the party that comes after (or on) their birthday the equation needs to be "closest( birthday <= party)"

I really like to hear your response since understanding this is very important for the analysis I am trying to make. I am trying to filter the lab values that are closest to pre-determined control visit dates of my patients who of course never show up on schedule and miss their appointments regularly by a week or so.

Thank you in advance for your response

florisvdh commented 4 months ago

And one more thing that you may want to consider while you are at it. The same goes for the birthday parties example:

And for each employee we want to find the first party date that comes after (or on) their birthday. We can express that with a rolling join:

I guess to find the party that comes after (or on) their birthday the equation needs to be "closest( birthday <= party)"

I really like to hear your response since understanding this is very important for the analysis I am trying to make. I am trying to filter the lab values that are closest to pre-determined control visit dates of my patients who of course never show up on schedule and miss their appointments regularly by a week or so.

Thank you in advance for your response

(Posted by @kayagorur above)

This specific post is a duplicate of #1610.

florisvdh commented 4 months ago

@kayagorur you're right about the error in fig 19.16. This issue is already a duplicate of #1470. So I suggest to close this one.

kayagorur commented 4 months ago

Thank you for the response. This is very helpful. Solved my problems and fixed my understanding of the concept. I am closing the issue then. :)