skywolf829 / GSTK

Gaussian Splatting toolkit application. One stop shop for preprocessing your dataset, training your model with human-in-the-loop training, and editing saved GSplat PLY files.
MIT License
4 stars 1 forks source link

Training speed slowed from threads #20

Closed skywolf829 closed 7 months ago

skywolf829 commented 7 months ago

Obeserved

Training speed in train.py for the mic test dataset reaches a max around ~220 updates per second. Meanwhile, training through the frontend app gives training speeds around ~120 fps.

I've noticed that when some threads are stopped, performance increases.

Desired

The communication thread should not cause a slow down for backend training speed. If the renderer is completely disabled, backend training should run at full speed.

skywolf829 commented 7 months ago

Multiprocessing doesn't seem like a solution here because too much data is shared. May need to use different communication backend if the multiprocessing.connection is the big slowdown.

skywolf829 commented 7 months ago

Not exactly closed, but I re-worked the backend to not use as many threads so we have more direct control over the main thread and computation. Threads are only created for the initial socket setup (since its blocking and stops control+c) and for dataset loading since it's I/O bound.

Now, we have 1 main loop that will

  1. read and process all messages available from the client
  2. render the model from the render camera's viewpoint
  3. perform train step

Some steps may be skipped if the model isn't initialized, training isn't turned on, etc.

With this implementation, our frontend has approx the same framerates as the official SIBR implementation during training on the mic dataset. 100-120 FPS while training with rendering at the same speed. Disabling rendering or training helps.

TODO