hanglearning / VBFL

76 stars 17 forks source link

Two questions about running the VBFL code #12

Open Hongchenglong opened 1 year ago

Hongchenglong commented 1 year ago

I have Colab Pro and Colab Pro+ accounts but they don't run for very long. They always disconnect after about 20 communications.

I have two questions about running the VBFL code. Could you answer them?

Firstly, I would like to know how I can keep Colab running for a long time?

Secondly, could you tell me if you set both high RAM and a V100 GPU?

hanglearning commented 1 year ago

Hi @Hongchenglong,

Sorry for the inconveience.

First, as far as I can remember, I used Colab Pro, since there was no Pro+ at that time. My Colab Pro back then could run at least 100 runs with the provided sample running arguments using V100. When you run VBFL, did you keep the colab running with an active broswer tab? I remember I kept Colab running in an open, active tab and reached 100 rounds without an issue. I should have implemented a function to resume the execution upon aborting.

Second, I used V100 GPU, but without high RAM. When we use GPU, I believe the current code stores the neural nets in the GPU RAM, so even we set to high RAM, it will not be utilized. The V100 RAM is enough to handle 100 rounds. However, V100 GPU's ram may be used up after like 110 rounds, aborting the execution. Please see the Known Issue section on README.md. To overcome this, one way is to select CPU for your runtime on Colab+, and at the same time select high RAM.

Please let me know if I have answered your questions.

Hongchenglong commented 1 year ago

Thank you for your very detailed reply.

First, I now understand why the program always aborts early. This is because I set the flag -ha 14,5,1, which results in the message no valid block has been generated this round, and causes an error io.UnsupportedOperation: not readable to abort the program.

Second, I tested using CPU to run a round that needs 10 minutes, and using V100 GPU just needs 3 minutes. And a high RAM just takes a little computing unit. Therefore, I will use a V100 GPU and high RAM to run the program.

hanglearning commented 1 year ago

Thank you for your feedback!

First, got it. I should've tested the situation where there's only one miner. After applying the patch, the bug has been fixed, right?

Second, yes, using CPU is much more slower than GPU. I am a bit surprised Colab charges higher for large RAM than using a GPU. Thought renting a GPU was more expensive 😄

Hongchenglong commented 1 year ago

First, yes, the bug has been fixed.

Second, sorry for my poor explanation, I mean that high RAM is cheaper, and GPU is more expensive.

hanglearning commented 1 year ago

Oh shoot, my bad. I misunderstood your message. I thought longer charges more. Turns out that CPU needs a little computing unit, so it should be cheaper. 😄

Safe to close this issue?