I tried to look for initial learning rate for the "snap" training mode, but I could not find it. A couple of questions related to snap,
1) The original Snap authors use initial learning rates of 0.2 and 0.1, Did you also use 0.2 as the initial learning rate?
2) You use "C" as 20 meaning one Snap every epoch and choose last N Snaps or N random Snaps? (N being 8 according to your paper), Also T = total number of training iterations , which means for eigen zhou split it would be (39810//batch_size)* num_epochs?
Hey @mattpoggi ,
I tried to look for initial learning rate for the "snap" training mode, but I could not find it. A couple of questions related to snap, 1) The original Snap authors use initial learning rates of 0.2 and 0.1, Did you also use 0.2 as the initial learning rate? 2) You use "C" as 20 meaning one Snap every epoch and choose last N Snaps or N random Snaps? (N being 8 according to your paper), Also T = total number of training iterations , which means for eigen zhou split it would be (39810//batch_size)* num_epochs?
Thanks in advance 👍