singnet / snet-sdk-python

Make gRPC calls to SingularityNET services programmatically using Python
MIT License
0 stars 1 forks source link

Unification with snet-cli #9

Open astroseger opened 5 years ago

astroseger commented 5 years ago

SDK lacks some features which already exist in snet-cli (dynamical caching of everything (channels/service metadata/compiled protobuf), time based strategies for gas price, support of different types of identities, etc). Instead of re-implemented all these features in SDK we should isolated them in separate library and use them in SDK and in snet-cli together.

General strategy should as following: if we need some functionality in SDK and this functionality is already presented in snet-cli we should take it from snet-cli. If we don't like how it is implemented in snet-cli we re-implement it, but we also replace old functionality in snet-cli.

@vforvalerio87 @raamb do you agree with this approach?

Functionality which already present in snet-cli, but lucking in SDK

  1. Support of different identity types. We should directly use identity.py (@vforvalerio87 if you don't like how it is implemented it should be re-implemented in such a way that snet-cli could also use it)
  2. Mechanism for dynamical caching everything: channels/service_registration/service_metadata/compiled protobuf. SDK should also use ~/.snet by default (as I believe @ferrouswheel suggested). In such a way SDK and snet-cli will use the same cached information which is great.
  3. General mechanism for sending on-chain transactions including time based strategies for gas price. I'm not big fun how it is implemented in snet-cli. So we could re-implement it (but in such a way that snet-cli could use it!).
  4. Small libraries: utils_agi2cogs.py utils_agi2cogs.py utils_agi2cogs.py, some functions from utils.py
  5. Functions for calculate call metadata from mpe_client_command.py
ferrouswheel commented 5 years ago

I just want to clarify I think the .snet directory should be per project. Just like .git. Projects will be fixed to a particular published service spec, so we don't want different projects interacting with the same cache.

This .snet directory would still be used by SDK and CLI, but wouldn't be on the home directory.

The reason for this is that if I build an application using SNet, I should be able to bundle it up and all dependencies should be inside the project directory. I don't want to have to extract compiled protobuf/grpc files from a home directory, and I especially don't want to have to filter and find only the files related to my current project (if I'm working on multiple applications).

Related issue is here: singnet/snet-cli#173 which makes it clear it should be per project.

raamb commented 5 years ago

Essentially the same as issue singnet/snet-cli#214 Choosing the right strategy will also solve for singnet/snet-cli#219 as the functionality of invoking a service using just the service-id and org-id can be provided by the SDK itself.

My preference is the second option proposed by Vitaly, which essentially has the snet-cli built around the SDK

snet-client-common ^ snet-client-sdk ^

snet-cli

Lets align on the approach and we can figure out the implementation. Eventually would like for all out client tools to be built around the SDK so that we eat our own dog food. Thoughts @vforvalerio87 @pennachin @ferrouswheel @astroseger @arturgontijo @vsbogd @ksridharbabuus

vsbogd commented 5 years ago

I think final solution should be discussed among with discussing API which sdk provides to client. There should be some difference between specific and generic client implementation. Obviously client code for specific service (generated by sdk) can directly use service API. And snet-cli code should make generic calls. I think we need to describe and discuss sdk API first and then discuss how to implement snet-cli using it.

My thoughts on other items raised by @astroseger in issue description:

Items 1, 3, and 5 sounds reasonable to me.

  1. Mechanism for dynamical caching everything

I think sdk should NOT restrict client in a way of storing caching information. Instead I would provide some generic interface which can be implemented by client to use cache storage of his choice. To simplify client writing I would also provide file storage implementation which can be in turn reused in snet-cli. I would make storage path to be a parameter of this implementation to allow client choosing file path to store cache.

  1. Small libraries: utils_agi2cogs.py utils_agi2cogs.py utils_agi2cogs.py, some functions from utils.py

It is difficult to say right now, but sounds like a good candidate for snet-client-common package

astroseger commented 5 years ago

@raamb

  1. I fully agree with @vsbogd that we should discuss API which SDK provides.

  2. I agree that it would be nice to rewrite snet-cli using SDK. But with current API it is "not practical".

  3. I still think that general strategy should be as described in the first post of this issue! We should extract existed functions and functionality from snet-cli. I can help with it, after we define API.

ferrouswheel commented 5 years ago

I would make storage path to be a parameter of this implementation to allow client choosing file path to store cache.

As long as this has a sensible default (e.g. file cache in a project's .snet directory) and isn't yet another configuration parameter to think about.

Customisation should take a back seat to "Everything Just Works" with minimal cognitive overhead for a developer trying to use SNet.

vsbogd commented 5 years ago

Customisation should take a back seat to "Everything Just Works" with minimal cognitive overhead for a developer trying to use SNet.

Sure, just would like to emphasize that sdk should be more flexible than current snet-cli implementation.

For example I don't think that sharing cache between snet-cli and sdk is the best option, as it raises question about version compatibility for instance. As developer I would prefer having two different directories to not bother whether sdk and snet-cli versions I use are compatible. I don't insist they should be different by default, but I think we definitely need to provide such option.

raamb commented 5 years ago

2. I agree that it would be nice to rewrite snet-cli using SDK. But with current API it is "not practical".

I see your point on the practicality of doing this.

3. I still think that general strategy should be as described in the first post of this issue! We should extract existed functions and functionality from snet-cli. I can help with it, after we define API.

Sure, as you suggest lets work on pulling out common functionality into a common dependency for both the client and the SDK. 1,3,4 and 5 are clear candidates for this.

With regards to the API that the SDK will support it should

I doubt that any of this is new and pardon if this is at a very high level.

pennachin commented 5 years ago

I think in the long run having the CLI built on top of the SDK is a good idea, but I also agree that's not the best short term path. We should plan for that in the medium term.

I also agree with this:

For example I don't think that sharing cache between snet-cli and sdk is the best option, as it raises question about version compatibility for instance. As developer I would prefer having two different directories to not bother whether sdk and snet-cli versions I use are compatible. I don't insist they should be different by default, but I think we definitely need to provide such option.

vforvalerio87 commented 5 years ago

About the "sdk cache" issue:

I think first and foremost, the sdk should be completely stateless by default. You can't assume the user who runs it will have a reliable filesystem at his disposal (fargate cluster, AWS lambda, whatever).

That being said, I proposed having a "pluggable cache" system where you basically specify what persistence engine you want to use, which could be things that the SDK understands like "filesystem" (based on pickle in the Python iteration of the SDK, for example), or "cli" (which means, "I have e CLI installed on my system, please leverage that for caching", which is the same thing that the AWS SDKs do if you have an AWS CLI installed and configured on your system... but you can still specify whatever authentication options you prefer and override that), or provide a class with conventional methods that do serialization and deserialization (example: persistance=DatabaseStorage or something, which would be a class with by convention a "save" and "load" method that would serialize and deserialize the cache for you).

This way the developer who uses SNET CLI and has it configured could use that in his (Python) client application, while the user who runs his service on whatever distributed docker cluster could do persistency on an external database / nfs mount / dynamoDB / whatever so if he destroys / re-creates containers they can all still access and leverage the same cache for the whole cluster.

There's an issue open that specifically regards this which is #7

By the way, I think we should have the same approach with authentication.

Meaning: you could just specify "CLI" as an authentication method (as an alternative to "private_key" or "mnemonic" or whatever), at which point the SDK would look at your CLI configuration and use whatever authentication method you specificed there as the default identity.

raamb commented 5 years ago

Lets get started with the refactor effort as proposed by Sergey and look to build a common dependency layer across SDK and CLI

astroseger commented 5 years ago
  1. @vsbogd @vforvalerio87 I think your slightly over complicate things about cache. I want to underline that it is rather cheap to find channels (~5 sec) and it is rather cheap to download metadata and recompile protobuf (~5 sec). We obviously shouldn't do it at each call, or even at each application start, but there is no reason to make something complicated here.... We already have implementation of dynamical caching in snet-cli and directory can be made optional if you want...

  2. I will extract the following functionality from snet-cli

    • configuration (I believe SDK could share configuration file with snet-cli)
    • function which open new channel or do nothing if channel is already exists
    • function which get channel with given org_id/service_id (with dynamical caching)
    • function which get metadata for service + download and dynamically compile protobuf (with dynamical caching! so this function can be used at each call)
    • function which form grpc-metadata for a call (this can be used to make a call as current SDK does)
    • function which make a call as in singnet/snet-cli#219

As a first step I will simply isolate these functions inside snet-cli without putting them in separate repo. @raamb Do you agree?

astroseger commented 5 years ago

(I've removed item 3 from my previous post because it is irrelevant now... )

raamb commented 5 years ago

As a first step I will simply isolate these functions inside snet-cli without putting them in separate repo. @raamb Do you agree?

Agree with this. We can start pulling out code.

  1. @vsbogd @vforvalerio87 I think your slightly over complicate things about cache. I want to underline that it is rather cheap to find channels (~5 sec) and it is rather cheap to download metadata and recompile protobuf (~5 sec). We obviously shouldn't do it at each call, or even at each application start, but there is no reason to make something complicated here.... We already have implementation of dynamical caching in snet-cli and directory can be made optional if you want...

The point here is that the caching strategy should be pluggable. The first version would be the one you describe of dynamically downloading to a project specific directory but the solution should be able to support other options as required. The point @vforvalerio87 made about Lambda is quite valid as we have seen interest in some Telegram groups on building out services on a serverless framework. We should be able to support any persistent store which is what the pluggable approach affords.

vforvalerio87 commented 5 years ago

Same... literally everybody I spoke to asked how to set it up with Lambda, and nobody will have the snet cli installed on the server where the service runs. Having the sdk share configuration / cache with the cli is just for convenience for developers who test locally on their computer.

I agree about extracting all the code you mentioned so it can be used in both places.

astroseger commented 5 years ago

@vforvalerio87 @raamb Sorry guys, but I still don't quite get why the existed solution with dynamical caching is not enough for your... You simply specify directory where cache is stored. It is totally fine works in Docker (cache will be rebuild on fly after restart)....

vforvalerio87 commented 5 years ago

In Docker, yes. In Lambda cache would be lost at every call (same with Azure functions or Google Cloud Functions)

astroseger commented 5 years ago

Cool. But I assume that filesystem is a common way to represent information in linux. Isn't it?

If there is no means to have any persisted storage in lambda. Then the only one solution is to have "static" cache. So you simply create container with all information (including cached channels and compiled protobufs) inside. If this information became outdated it will be automatically replaced in each call but there is no magical solution for it.. Isn't it?

In any case we already support it...

astroseger commented 5 years ago

Ah yes... Am I right that you are saying that as a persisted storage in lambda is better to use some kind of DB and not a filesystem?

vforvalerio87 commented 5 years ago

As I was suggesting above and in issue #7, there is a very simple solution: the user provides a class with a pre-defined interface which takes care of serializing and deserializing the application state.

Example:


from snet import Snet
from state_cache import MySQLCache

snet = Snet(persistence_engine=MySQLCache, ...)

MySQLCache is a custom class provided by the user which must expose at least the following methods: save, load. save takes a key (example: "logs") and dict, load takes a key (example: "logs") and returns a dict. So then in the snet_sdk when I want to cache information I'll do for example: self.persistence_engine.save("logs", logs) or self.persistence_engine.load("logs")

In __init__ for the class I'll check if the class provided by the user implements the methods I require with the signature I expect. If user specifies nothing, we use the file system.

With this we can also offer the user the possibility to avoid asking the channel state from the daemon completely, making calls faster. You store the state locally, and even if you have multiple machines using the same private key it's not an issue because they use the same storage for things like channel state, so if client1 makes a call and the state is updated, client2 can take the information from the same storage without asking the daemon all the time. It's 1 grpc call every service call instead of 2.

If something happens on the daemon side, for example the service provider makes a claim, we should have a background process (async) in the SDK that gets the logs and the channel information from the blockchain every time a new block is mined.

ferrouswheel commented 5 years ago

@vforvalerio87

With this we can also offer the user the possibility to avoid asking the channel state from the daemon completely, making calls faster. You store the state locally, and even if you have multiple machines using the same private key it's not an issue because they use the same storage for things like channel state, so if client1 makes a call and the state is updated, client2 can take the information from the same storage without asking the daemon all the time. It's 1 grpc call every service call instead of 2.

Unless you use some locking mechanism, that is how you get race conditions ;-)