os-climate / aicoe-osc-demo

This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.
Apache License 2.0
11 stars 24 forks source link

Investigate using NeuralMagic as post-training step [Tracker] #91

Open erikerlandson opened 2 years ago

erikerlandson commented 2 years ago

cc @ChristianMeyndt

NeuralMagic is basically a tool for analyzing a neural net model, and identifying a modified sparse topology that is much smaller and faster. It operates as a second training phase. So one trains a model and then run a tool to analyze the model, and then a second training run to fine tune the new sparse architecture.

NeuralMagic is capable of making sparse versions of a model that are 10-100 times smaller and faster. Actual results are of course dependent on specifics of the problem domain.

Once we have the training pipeline fully ported, it should be relatively easy to add a neural-magic stage to generate a sparse version of the model.

in Neural magic the typical steps are:
1. train a model 
2. convert to ONNX 
3. use their sparsify tool (it analyzes the model and show the performance improvement you can get with pruning (only available in NM yet))
4. get a sparsify recipe (a YAML file for their optimizer)
5. now train the model with their optimizer (sparseML)
6. convert to ONNX 
7. deploy with their deepsparse inference engine, working with avx2 or avx512 only at the moment

references

pacospace commented 2 years ago

Thanks @erikerlandson!

Neural Magic has also the possibility to fine-tune models. We could start working on a new Elyra pipeline that:

cc @markurtz (welcome) What could be resources for fine-tuning that model?

ChristianMeyndt commented 2 years ago

Hi @erikerlandson and @pacospace, making the models smaller will help a lot for sure and Neural Magic sounds very promising! If you need any further information on the current model training solution feel free to reach out to me. I'm curious to see the outcome of this fine tuning! Thanks

erikerlandson commented 2 years ago

Thanks Christian!

The main issue I'm aware of is understanding exactly what data-set you trained your model with, because Francesco would want to provide NM with that same data to do it's version of training.

On Wed, Dec 8, 2021 at 9:51 AM ChristianMeyndt @.***> wrote:

Hi @erikerlandson https://github.com/erikerlandson and @pacospace https://github.com/pacospace, making the models smaller will help a lot for sure and Neural Magic sounds very promising! If you need any further information on the current model training solution feel free to reach out to me. I'm curious to see the outcome of this fine tuning! Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/os-climate/aicoe-osc-demo/issues/91#issuecomment-988988464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7OOUUYWXJEQ7V2K2QKQDUP6EHPANCNFSM5HL4YI6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ChristianMeyndt commented 2 years ago

We need the KPI mapping file (https://github.com/os-climate/corporate_data_pipeline/tree/main/data_input/ESG/kpi_mapping) and the annotations file (https://github.com/os-climate/corporate_data_pipeline/tree/main/data_input/ESG/annotations) as input. And then of course we also need all the PDFs that are mentioned in the annotations file. Probably we already have these PDFs within these 40.000 reports on S3, but I'm not sure if you can easily find them by the file name. Else we have these 300-400 reports on our side and could also upload them somewhere.

ChristianMeyndt commented 2 years ago

FYI @HeatherAck @JeremyGohBNP @LeaADeleris @andraNew @OferHarari @idemir-ids @DaBeIDS @mriefer This is the issue for NeuralMagic we talked about on Monday. It sounds really promising.

pacospace commented 2 years ago

We need the KPI mapping file (https://github.com/os-climate/corporate_data_pipeline/tree/main/data_input/ESG/kpi_mapping) and the annotations file (https://github.com/os-climate/corporate_data_pipeline/tree/main/data_input/ESG/annotations) as input. And then of course we also need all the PDFs that are mentioned in the annotations file. Probably we already have these PDFs within these 40.000 reports on S3, but I'm not sure if you can easily find them by the file name. Else we have these 300-400 reports on our side and could also upload them somewhere.

Thanks @ChristianMeyndt!! I will check and let you know in case I have any trouble!

pacospace commented 2 years ago

Overview of the work: https://docs.google.com/presentation/d/1BvrbKUaqxs9CSKrwUYVgxCpvqBZ3GXDWsTYPLSTffHs/edit#slide=id.g1125f65fa8b_0_0