vdblab / FLORAL

Fit LOg-RAtio Lasso regression for compositional covariates
https://vdblab.github.io/FLORAL/
GNU General Public License v3.0
12 stars 0 forks source link

Longitudinal: runs step 2 CV twice (plus bonus runs?) #23

Open funnell opened 2 months ago

funnell commented 2 months ago

Hello,

I'm running FLORAL (the version on CRAN) with longitudinal=TRUE and family="binomial" like so:

fit_0_6 <- FLORAL(
  x = as.matrix(X_0_6), y = y_0_6,
  longitudinal = TRUE,
  id = as.integer(factor(metadata_0_6$mouse_id)),
  tobs = metadata_0_6$day,
  family = "binomial",
  ncv = 5,
  progress = TRUE
)

and getting this output:

Using elastic net with a=1.Algorithm running for full dataset:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Algorithm running for cv dataset 1 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Algorithm running for cv dataset 2 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Algorithm running for cv dataset 3 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Algorithm running for cv dataset 4 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Algorithm running for cv dataset 5 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 1 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 2 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 3 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 4 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 5 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 1 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 2 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 3 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 4 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Step2: Algorithm running for cv dataset 5 out of 5:
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

I'd like to point out two things:

  1. Step 2 CV seems to be running twice (I didn't just paste it twice 😄)
  2. Between step 1 and step 2 (and the second step 2) there is an extra progress bar. When I watch it run, this takes about as much time to run as the other progress bars.

I'm not sure if this is expected behavior, but if so, maybe the progress bar labels could be updated.

tengfei-emory commented 2 months ago

That is an intended behavior. I need to modify the step 2 algorithm to mute some of the runs.

One main difference in the GEE models is that we no longer have an existing popular/fast algorithm like glmnet for the step 2 runs. The current working solution is to use FLORAL's algorithm without imposing the zero-sum constraint, which produced a number of progress bars as a result.

tengfei-emory commented 3 weeks ago

The latest update from the dev-GEE branch should address this issue.