muellerzr / fastinference

A collection of inference modules for fastai2
https://muellerzr.github.io/fastinference
Apache License 2.0
89 stars 16 forks source link

New fastai inference API #32

Closed tcapelle closed 3 years ago

tcapelle commented 3 years ago

Hey Zach, Let's work here prototyping the inference. What I would like to have (Santa's wishful letter).

We should have tests that periodically verify that this functionality is not broken, and the performance is maintained. This is something fastai does not have right now and it needs, e.g., fastai's unet is slower than before, noted this the other day.

Another cool thing, would be to directly serve the model with torch.serve directly from fastai. Like,

learn.serve(port=5151)

and get a service running to make inference over HTTP.

muellerzr commented 3 years ago

Hey Thomas,

Sure but I’d need to make a new branch.

fastinference already does ONNX completely, so 2 is already done. To me I think jit, ONNX, and torchscript is what we should support, as that’s already quite a lot. Maybe TensorRT, I’ve worked with it before so it’s not the most difficult to do.

On the weekends is when I can put in a bit of effort into this. Later today I can make a new branch though.

On Wed, Jan 20, 2021 at 2:45 AM Thomas Capelle notifications@github.com wrote:

Hey Zach, Let's work here prototyping the inference. What I would like to have (Santa's wishful letter).

  • Streamlined torchscript support on all fastai models, simple models should be compatible with jit.trace and more complex ones, with decisions with jit.script. The guys at facebook may be able to help here, they are super interested on this right now.
  • ONNX: Exporting on all models, image encoders should work out of the box, some layers are missing for Unet's (PixelShuffle). Tabular should work also. Without being an expert, I would expect that torchscript replaces the ONNX pipeline in the future, one less layer.
  • RTTorch: We should start discussing with them probably, as the TensorRT frameworks is super fast for GPU inference. This could be done latter, once we have ONNX exports. I have a contact at NVIDIA that could help us export to TensorRT.
  • DeepStream? Stas Beckman is a guru on this topic, we could ask him what he thinks about it.

We should have tests that periodically verify that this functionality is not broken, and the performance is maintained. This is something fastai does not have right now and it needs, e.g., fastai's unet is slower than before, noted this the other day.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/muellerzr/fastinference/issues/32, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YCV55RSBKTHUEZQGC3RLS22JZZANCNFSM4WKKGRDA .

muellerzr commented 3 years ago

One aspect I think we should also include is exporting of the DataLoaders, similar to how learn.export does. It’s not too hard to do, and can help people who want the speed of ONNX, etc, but still use the fastai pipeline for data preprocessing. Experts can of course go ahead and make their own preprocessing pipeline if they know what they’re doing, but many do not.

On Wed, Jan 20, 2021 at 2:54 AM Zachary Mueller muellerzr@gmail.com wrote:

Hey Thomas,

Sure but I’d need to make a new branch.

fastinference already does ONNX completely, so 2 is already done. To me I think jit, ONNX, and torchscript is what we should support, as that’s already quite a lot. Maybe TensorRT, I’ve worked with it before so it’s not the most difficult to do.

On the weekends is when I can put in a bit of effort into this. Later today I can make a new branch though.

On Wed, Jan 20, 2021 at 2:45 AM Thomas Capelle notifications@github.com wrote:

Hey Zach, Let's work here prototyping the inference. What I would like to have (Santa's wishful letter).

  • Streamlined torchscript support on all fastai models, simple models should be compatible with jit.trace and more complex ones, with decisions with jit.script. The guys at facebook may be able to help here, they are super interested on this right now.
  • ONNX: Exporting on all models, image encoders should work out of the box, some layers are missing for Unet's (PixelShuffle). Tabular should work also. Without being an expert, I would expect that torchscript replaces the ONNX pipeline in the future, one less layer.
  • RTTorch: We should start discussing with them probably, as the TensorRT frameworks is super fast for GPU inference. This could be done latter, once we have ONNX exports. I have a contact at NVIDIA that could help us export to TensorRT.
  • DeepStream? Stas Beckman is a guru on this topic, we could ask him what he thinks about it.

We should have tests that periodically verify that this functionality is not broken, and the performance is maintained. This is something fastai does not have right now and it needs, e.g., fastai's unet is slower than before, noted this the other day.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/muellerzr/fastinference/issues/32, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YCV55RSBKTHUEZQGC3RLS22JZZANCNFSM4WKKGRDA .

tcapelle commented 3 years ago

You don't sleep!