stacksgov / grants-program

Welcome to the Stacks Foundation Grant Program. Community members interested in submitting a grant proposal may do so by opening an issue in this repository and filling out the grant application.
141 stars 36 forks source link

Oracle for Machine Learning Models #171

Closed cecilpang closed 1 year ago

cecilpang commented 3 years ago

Project proposal from Open Art Source

Background It is very common today for apps to have machine learning components. In fact most newly developed apps have some forms of artificial intelligence built in. The same cannot be said for Dapps, mainly because machine learning today is mostly centralized. There is no natural and easy way to include a machine learning model as part of a smart contract. At least not in a decentralized way. There are data feed Oracles that utilize machine learning in composing the data sets. But that is not the same as having a machine learning model as part of the smart contract itself. What is lacking is a standard way to deploy machine learning models together with smart contracts. Dapp developers work around this problem by building their own proprietary infrastructure to serve their ML models off chain. In essence, this introduces a centralized component to the smart contract, which makes the Dapp not entirely decentralized. Another issue is that building a ML infrastructure requires expertise in MLOps. It is a high hurdle to overcome for Dapp developers. Note that large corporations have dedicated teams of MLOps engineers and the cost of MLOps often exceeds the cost of building the Dapps.

Project Overview This project is to design and build a fully configurable Oracle node architecture to serve a wide variety of machine learning models for Stacks smart contracts. We call this the ML Oracle. ML Oracle will be an extension to the Chainlink Oracle architecture and will be compatible to smart contracts built to access Chainlink Oracles.

image

ML Oracle will be useful to smart contract developers who want to incorporate machine learning models into their smart contracts. The following are example use cases.

Imagine a smart contract that distributes funds based on an algorithm, and the algorithm includes a ML model that forecasts the sales of a product in the coming week. The smart contract makes inference call to the model on ML Oracle to obtain the forecast via a Chainlink Oracle contract. Another use case is NFT for generative art. At present, NFTs record the final generated images. However, there are generative arts in which the generative ML model is an integral part of the artwork. Interaction with the model to generate new images is part of the viewing experience. A well known ML model type used by generative art is GAN. With ML Oracle, the ML model becomes part of the NFT and therefore interactive viewing experience is made possible.

Scope

  1. ML Oracle Contract on the Stacks Blockchain • Assuming this can piggyback on another Stacks Foundation supported project that makes Chainlink native to Stacks blockchain.
  2. Chainlink external adapters. Will explore and implement one or both of • standard api model • original signed data model
  3. ML Oracle toolsets to help • build a ML Oracle node • configure and deploy ML models to ML Oracle. • support Pytorch and Tensorflow deep learning models • support non deep learning models built using Scikit-Learn
  4. Example smart contract to use a forecasting model, and a Dapp to use the smart contract.
  5. Example NFT of AI generative art.

Budget and Milestones The project will take 4 to 6 person months to complete. Total Grant Request: $100k

M1: a bare-bone, end to end system of a smart contract invoking a ML model via Chainlink. $30k. M2: a sample smart contract, ML model and Dapp to showcase the bare-bone system. $5k M3: add to the bare-bone system capability to configure and serve Pytorch models. $20k M4: a sample NFT with a GAN model, and a Dapp to view the NFT. $5k M5: repeat M3 for Tensorflow. $10k M6: repeat M3 for, non deep learning, Scikit-Learn models. $10k M7: develop scripts to automatically build and run all the components of a ML Oracle node. $10k M8: complete documentation. $10k

Team The team will consists of two people with extensive experience in AI, MLOps engineering and Blockchain software engineering. https://www.linkedin.com/in/cecil-pang-17b0a95/ https://www.linkedin.com/in/gary-ng-8526668b/

Risks The main risk is the dependency on when Chainlink will be native to Stacks. If it turns out that this project cannot piggyback on the Stacks Chainlink Oracle project, the cost of scope 1 will be much higher.

Future Work At the completion of this project, a new project will be proposed to cover the following enhancements and other ideas learned during this project.

• the capability to drop in and configure pre and post processing to the ML models • add configurable security at model and node levels. • add necessary features to support a community ML Oracle node that serves models from different Dapp developers. • develop configurable scripts to containerize different components of ML Oracle.

stx-grant-bot[bot] commented 3 years ago

Thanks for submitting a grant proposal. Our team will review your submission and get back to you.

jennymith commented 3 years ago

We’re excited about this proposal and are generally leaning toward approval. Before we do that, could you please submit a quick video, presentation, or panel of user stories that walk us through what a developer would be able to do with this tool at the end of each milestone? We appreciate that machine learning models are very complex so we’d just like to gain a more detailed understanding of this tool’s use cases/applications.

RaffiSapire commented 3 years ago

@cecilpang Hi there! Wanted to make sure you saw Jennys comment above. our next committee meeting is tomorrow.

cecilpang commented 3 years ago

Additional info on the milestones, per @jennymith 's request. And thank you @RaffiSapire for the reminder.

M1: a bare-bone, end to end system of a smart contract invoking a ML model via Chainlink M2: a sample smart contract, ML model and Dapp to showcase the bare-bone system.

At the end of M1 and M2, a developer will be able to build and run a ML Oracle instance on a Linux server and deploy a Machine Learning model to the server. Then a Stacks smart contract can be created to run inference on the model. Stacks/Chainlink integration protocol will be used end to end. The code and step by step instructions to build and run the server will be available in a github repo. There will be scripts to aid the process and there will also be manual steps involved. At this early stage, deploying a model will mean copying some Python files to a designated directory. There will be some flexibility of deploying multiple models and calling them by name or unique id.

An overall architecture design will be produced. Subsequent milestones will be built based on this architecture.

M3: add to the bare-bone system capability to configure and serve Pytorch models.

At the end of M3, there will be a new server module in ML Oracle specifically designed to serve Pytorch models. Developers will be able to deploy a trained Pytorch deep learning model together with its configuration metadata. Smart contracts will be able to call inference to the model. There will be step by step instructions on running scripts to start this module. There will also be instructions on how to prepare a Pytorch model and deploy it.

M4: a sample NFT with a GAN model, and a Dapp to view the NFT.

The purpose of this milestone is to demonstrate the use of a Pytorch model. It also showcases associating an NFT with a model type called GAN, which generates images. The GAN will be a trained Pytorch model.

M5: repeat M3 for Tensorflow.

Same as M3, for Tenforflow models.

M6: repeat M3 for, non deep learning, Scikit-Learn models

Same as M3, for Scikit-Learn and non deep learning models in general.

M7: develop scripts to automatically build and run all the components of a ML Oracle node

At this milestone, all of the manual steps to build and the different ML Oracle server components will be automated. And there will be only a very few number of scripts to run. There will be scripts to register models and their metadata ( e.g. model type such as Pytorch, Tensorflow, etc) so that they will be deployed to the appropriate modules.

M8: complete documentation

Online documentation of all the features and how to use them. There will also be tutorials and blog posts.

stx-grant-bot[bot] commented 3 years ago

Congratulations. Your grant is now approved. Please complete the on-boarding link here: https://stacks-grant.netlify.app/onboard?q=16845dc28c25be43569761ac1721d130

cecilpang commented 3 years ago

@RaffiSapire @jennymith Hi, The DocuSign contract has my name as the "Recipient" rather than the company name "Open Art Source, LLC". I cannot modify that field. Can you help? Thank you.

RaffiSapire commented 3 years ago

Hi, Please ping @jhammond2012 at 2n10se#5020 on discord to help.

jennymith commented 3 years ago

Just adding a note here that this has been resolved. Thanks and looking forward to seeing your progress on this @cecilpang!

cecilpang commented 3 years ago

@jennymith On it!

jennymith commented 3 years ago

Hey @cecilpang just checking in to see how the smart contract for M1 is going. Are you able to do much without the Stacks/Chainlink integration?

cecilpang commented 3 years ago

Hey @cecilpang just checking in to see how the smart contract for M1 is going. Are you able to do much without the Stacks/Chainlink integration?

Hi @jennymith , I have been working cautiously on the parts that are on Chainlink and our own node, under the assumption that the smart contract portion would be the same as the Ethereum integration. I saw the video you sent yesterday and it looks like my assumption is valid. I will accelerate the work now. Thank you for staying in touch.

cecilpang commented 2 years ago

Status update:

Successfully demoed to @jennymith on October 29th an end to end system that covers:

M1:

M2:

The demo showed hat M1 and M2 have been completed using the Ethereum network. It can be migrated to Stacks when the Stacks/Chainlink integration is available.

We have started working on M3. In fact, some of M3 was included in the demo: the image similarity model was written in PyTorch and served by a TorchServe server hosted on AWS.

We ask the disbursement of fund for M2 and M3 so that we can continue with M3.

Thank you.

jennymith commented 2 years ago

Hey @cecilpang just confirming that these milestones have been disbursed.

cecilpang commented 2 years ago

Thanks @jennymith. We have received the disbursements for M2 and M3. We are working on M3 now. Please advance the label to M3.

stx-grant-bot[bot] commented 2 years ago

M3 has been funded! When you are finished with this milestone, please comment on this issue with !m3_complete

vanesvibes commented 2 years ago

@cecilpang Vane here! I am supporting the Grants in updating information. Could you let us know your discord username?

cecilpang commented 2 years ago

@vanesvibes My discord username is Cecil#1321

will-corcoran commented 2 years ago

Hello and thank you for participating in the Stacks Foundation Grants Program!

We are in the process of migrating from GitHub to the new Grants Dashboard. In order to complete your grant, you will need to submit any remaining Progress Review and/or Final Review requests through the Dashboard in order to receive your remaining payments.

Lastly, please note we are marking this grant 'closed' on GitHub for organizational purposes, but it is still 'open' on the Grants Dashboard.

Thanks and we hope to continue to support your efforts with additional grants!

Best, Will