xuyiqing / tjbal

Trajectory Balancing
MIT License
11 stars 9 forks source link

tjbal does not correctly handle categorical variables #13

Open akravetz opened 2 weeks ago

akravetz commented 2 weeks ago

Categorical variables must be scaled by sqrt(0.5) per the docs in kbal. This fix would be as simple as passing in mixed_data=TRUE and cat_columns=... for categorical variables. This could easily be a pass through parameter. I'm happy to put up a PR if you'll review

chadhazlett commented 2 weeks ago

Thanks! We are about to update kbal and put it on CRAN. But at that point, yes, you are exactly right. I'll leave Yiqing to handle the tjbal side and your kind offer. Best, Chad

On Mon, Jul 8, 2024 at 9:18 PM Alex Kravetz @.***> wrote:

Categorical variables must be scaled by sqrt(0.5) per the docs in kbal. This fix would be as simple as passing in mixed_data=TRUE and cat_columns=... for categorical variables. This could easily be a pass through parameter. I'm happy to put up a PR if you'll review

— Reply to this email directly, view it on GitHub https://github.com/xuyiqing/tjbal/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKUFSAI7YOVNLAPGOWUQGDZLNQBTAVCNFSM6AAAAABKSA5VQSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TOMBXGUZTENA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

akravetz commented 2 weeks ago

awesome, thanks Chad! I'm also working on some time improvements to kbal (thank you for all the hard work here btw!). Two main threads here:

  1. Basically the eb() function stopping criteria is very generous and it seems like there could be some improvements here (ex: stop if dist has not improved in N iterations, or stop if the last N iterations have not converged)
  2. using the gpuR library as a backend to improve the speed of the eb update loop.

would love to pick your brain live on either of these if you've got a couple minutes

chadhazlett commented 2 weeks ago

That's great, thank you! I'll be curious how you came to take an interest in this. I'm cc'ing Borna, who has been working on the kbal update. I'd be happy to find a time to talk, perhaps next week monday or tuesday? One note while I'm thinking of it: as you'll have noticed, the ebalance function is a modification of the original ebal function from Jens Hainmueller, with a considerably tighter convergence criterion (as I recall). But if speed improvements are likely from the ideas you suggest, I think that's worth trying. Thanks!

On Mon, Jul 8, 2024 at 10:27 PM Alex Kravetz @.***> wrote:

awesome, thanks Chad! I'm also working on some time improvements to kbal (thank you for all the hard work here btw!). Two main threads here:

  1. Basically the eb() function stopping criteria is very generous and it seems like there could be some improvements here (ex: stop if dist has not improved in N iterations, or stop if the last N iterations have not converged)
  2. using the gpuR library as a backend to improve the speed of the eb update loop.

would love to pick your brain live on either of these if you've got a couple minutes

— Reply to this email directly, view it on GitHub https://github.com/xuyiqing/tjbal/issues/13#issuecomment-2216583240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKUFSAS63KODDJJJN66P2TZLNYDPAVCNFSM6AAAAABKSA5VQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJWGU4DGMRUGA . You are receiving this because you commented.Message ID: @.***>

akravetz commented 2 weeks ago

happy to find some time on monday. Do you want to shoot me an email and we can go from there? alex@first hand cares period com (no spaces, etc, gotta protect myself from spam bots 😃 )

chadhazlett commented 2 weeks ago

done, tell me if you don't see my email soon. best, chad

On Mon, Jul 8, 2024 at 10:48 PM Alex Kravetz @.***> wrote:

happy to find some time on monday. Do you want to shoot me an email and we can go from there? @.*** hand cares period com (no spaces, etc, gotta protect myself from spam bots 😃 )

— Reply to this email directly, view it on GitHub https://github.com/xuyiqing/tjbal/issues/13#issuecomment-2216629106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKUFSBXWR3NXJWEH5Z2I43ZLN2S5AVCNFSM6AAAAABKSA5VQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJWGYZDSMJQGY . You are receiving this because you commented.Message ID: @.***>

xuyiqing commented 2 weeks ago

I'll get to it later this month (after a major conference of our field). Thanks!

On Mon, Jul 8, 2024 at 11:07 PM Chad Hazlett @.***> wrote:

done, tell me if you don't see my email soon. best, chad

On Mon, Jul 8, 2024 at 10:48 PM Alex Kravetz @.***> wrote:

happy to find some time on monday. Do you want to shoot me an email and we can go from there? @.*** hand cares period com (no spaces, etc, gotta protect myself from spam bots 😃 )

— Reply to this email directly, view it on GitHub https://github.com/xuyiqing/tjbal/issues/13#issuecomment-2216629106, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABKUFSBXWR3NXJWEH5Z2I43ZLN2S5AVCNFSM6AAAAABKSA5VQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJWGYZDSMJQGY>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/xuyiqing/tjbal/issues/13#issuecomment-2216650501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2PKGB27SG7GGU556WAJKTZLN4YPAVCNFSM6AAAAABKSA5VQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJWGY2TANJQGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Yiqing Xu

Assistant Professor Department of Political Science Stanford University https://yiqingxu.org/