space-meridian / roadmap

High-level roadmap for Filecoin Station
https://starmap.site/roadmap/github.com/space-meridian/roadmap/issues/1
0 stars 0 forks source link

Platformize `meridian-publish` #134

Open juliangruber opened 4 months ago

juliangruber commented 4 months ago
bajtos commented 4 months ago

Few ideas/food for thought.

(1) Uneven scaling

spark-publish happily handles the load using a single worker process. voyager-publish required four worker processes to handle the load.

How will the platform support this?

(2) Who owns the DB schema

ATM, spark-api owns the DB schema and writes rows to the measurements table. spark-publish reads & deletes from that table. Both services live in the same monorepo, which allows us to deploy both spark-api and spark-publish together after we change the DB schema.

How will this work in the platform?

On one hand, it would be nice to extract everything related to the measurements table to the new platform.

On the other hand, this comes with a lot of complexity that could delay the delivery of this project:

(3) How do we design the DB for multi-tenancy?

There are different approaches to implementing a multi-tenant database. Each comes with different trade-offs.

The answer needs to consider how we want to handle uneven scaling (see (1) above).

(4) Is an SQL DB the right solution?

I think we need the following operations:

(5) IPC subnet

In the longer term, we wanted to research a more decentralised solution for recording & committing the measurements. One option is to build an IPC subnet (L2) with custom actors. The nodes in the subnet can collect the measurements and commit them in batches to the parent chain (Filecoin L1).

Is this a good time to research the feasibility of this option?

bajtos commented 4 months ago

(5) IPC subnet

Another alternative: Basin. Is it production ready yet?

juliangruber commented 4 months ago

Few ideas/food for thought.

🙏

(1) Uneven scaling

spark-publish happily handles the load using a single worker process. voyager-publish required four worker processes to handle the load.

How will the platform support this?

Looks to me like meridian-publish will have 1-n workers, and depending on which modules create how much load we will scale them up and down. Is that what you mean?

(2) Who owns the DB schema

ATM, spark-api owns the DB schema and writes rows to the measurements table. spark-publish reads & deletes from that table. Both services live in the same monorepo, which allows us to deploy both spark-api and spark-publish together after we change the DB schema.

How will this work in the platform?

On one hand, it would be nice to extract everything related to the measurements table to the new platform.

On the other hand, this comes with a lot of complexity that could delay the delivery of this project:

  • Different projects have different measurement schemas. This could be side-stepped by using PostgreSQL's support for JSON/BSON.

I was using a single column for the JSON blob in my poc rewrite 👍 We don't need schemas here, only validations (as you noted next)

  • Each project needs to run a custom validation and pre-processing step for measurements before they are stored to the DB.

In my poc rewrite, each module had a lib file that exposes a validation function.

In the future, each module's validation function could live on chain (eg JavaScript stored in a smart contract, which can be executed in a secure VM)

  • The uneven scaling requirements apply to the REST API for recording measurements too.

Here as well, I think that meridian-api would have 1-N worker processes, which we scale up and down based on load. Maybe I'm missing your point, but I don't see what's novel here.

(3) How do we design the DB for multi-tenancy?

There are different approaches to implementing a multi-tenant database. Each comes with different trade-offs.

  • DB server per tenant - this is what we did for Voyager
  • On DB server with multiple databases per tenant
  • One SQL table with a column linking table rows to tenants

I was thinking in this way. Only with this design can we handle the case where modules are added dynamically, by users, so I think we should design for it from the beginning.

Scaling wise, this means that we will need a queue implementation (let's talk more generic than SQL here) that supports the whole platforms' load.

The answer needs to consider how we want to handle uneven scaling (see (1) above).

I believe with this route uneven scaling is not a factor.

(4) Is an SQL DB the right solution?

I think we need the following operations:

This really is a queue:

  • append a new measurement to the list of unpublished measurements

Push a message onto the queue

  • lock a subset of measurements for publishing
  • remove the locked subset of measurements from the storage

Take a message off the queue

  • clear the lock if the publishing operation failed

Add the message back to the queue

NSQ for example supports this locking by requiring messages first to be read, and then to be acked (or something similar to this). The Ack means it has been processed successfully. If a message doesn't get acked in due time, it will appear back on the queue.

It is a good question what to use for this. I believe a queue would be a good fit, as it is closest to the data structure needs. We could also implement a queue on SQL, but I would first look into native queues like this (like NSQ or whatever is great now) or at least use a DB that is known to be queue friendly, like Redis.

(5) IPC subnet

In the longer term, we wanted to research a more decentralised solution for recording & committing the measurements. One option is to build an IPC subnet (L2) with custom actors. The nodes in the subnet can collect the measurements and commit them in batches to the parent chain (Filecoin L1).

Is this a good time to research the feasibility of this option?

I don't know! To me this means to host our own IPC subnet, which is adding unknown operational overhead.

bajtos commented 4 months ago

(1) Uneven scaling Looks to me like meridian-publish will have 1-n workers, and depending on which modules create how much load we will scale them up and down. Is that what you mean?

Yes, that works for me. Do we need to specify the process for scaling up/down a bit more, or is it just an implementation detail?

(2) Who owns the DB schema I was using a single column for the JSON blob in my poc rewrite 👍 We don't need schemas here, only validations (as you noted next)

👍🏻

In my poc rewrite, each module had a lib file that exposes a validation function.

In the future, each module's validation function could live on chain (eg JavaScript stored in a smart contract, which can be executed in a secure VM)

👍🏻

How do you envision deploying a new module to the Meridian platform? Will it mean deploying some descriptor (e.g. scaling configuration) and some JavaScript source code?

Ideally, this process should be easy to automate via GitHub Actions so that we can set up a CD pipeline that updates the validation function every time we land a new commit to the main branch.

Running a custom validation function - arbitrary JavaScript code - is potentially a huge security hole. How do you envision sandboxing the user-provided code so that it cannot access the outside environment and also cannot consume too much CPU time?

We can ignore this problem in the first iteration when we run only our code on the new platform, as long as the design does not prevent us from implementing sandboxing & CPU-time limitations later.

3) How do we design the DB for multi-tenancy? I was thinking in this way. Only with this design can we handle the case where modules are added dynamically, by users, so I think we should design for it from the beginning.

It would be great to capture this in "Alternatives considered" in the design doc.

(4) Is an SQL DB the right solution? This really is a queue:

Makes sense 👍🏻

(5) IPC subnet I don't know! To me this means to host our own IPC subnet, which is adding unknown operational overhead.

I agree that IPC subnets bring a lot of unknown unknowns.

OTOH, we wanted to design the Meridian platform as decentralised as feasible. If we build a multi-tenant centralised solution now, we will need to replace it with a decentralised solution in the future.

I don't know if this is the right time to switch to IPC subnets or Textile Basin; I just want us to make a well-informed decision.

bajtos commented 4 months ago

Running a custom validation function - arbitrary JavaScript code - is potentially a huge security hole. How do you envision sandboxing the user-provided code so that it cannot access the outside environment and also cannot consume too much CPU time?

We can ignore this problem in the first iteration when we run only our code on the new platform, as long as the design does not prevent us from implementing sandboxing & CPU-time limitations later.

WASM can be a solution for sandboxing. However, it makes the development of validation functions more complex. I don't know if the WASM implementation in V8/Node.js allows the host to limit how long a WASM function can run and kill the function invocation if it's taking too long to complete.

juliangruber commented 4 months ago

(1) Uneven scaling Looks to me like meridian-publish will have 1-n workers, and depending on which modules create how much load we will scale them up and down. Is that what you mean?

Yes, that works for me. Do we need to specify the process for scaling up/down a bit more, or is it just an implementation detail?

How will the process for scaling be different from the one we have now?

How do you envision deploying a new module to the Meridian platform? Will it mean deploying some descriptor (e.g. scaling configuration) and some JavaScript source code?

I don't understand why adding a module will mean updating scaling configuration. We will update scaling configuration for the whole system once the whole system can't handle the load of all modules together. Or am I understanding you wrong?

At the moment, yes, deploying a new module will mean

In the future, this should be possible without us doing anything. This is for the future to solve though, right now we are focusing on being able to add more modules ourselves without needing to duplicate infrastructure.

We need to design the economics for non-admins adding new modules: Each module costs, how should this work? Do people need to stake and pay a fee, etc? I believe once we have designed crypto economics for this, we can talk about implementation details.

Running a custom validation function - arbitrary JavaScript code - is potentially a huge security hole. How do you envision sandboxing the user-provided code so that it cannot access the outside environment and also cannot consume too much CPU time?

We can ignore this problem in the first iteration when we run only our code on the new platform, as long as the design does not prevent us from implementing sandboxing & CPU-time limitations later.

Yes this is a tough problem to solve. JS or WASM is my first thought. Another option is to only allow validation as configuration, not as code. Something like JSON Schema.

  1. How do we design the DB for multi-tenancy? I was thinking in this way. Only with this design can we handle the case where modules are added dynamically, by users, so I think we should design for it from the beginning.

It would be great to capture this in "Alternatives considered" in the design doc.

+1

(5) IPC subnet I don't know! To me this means to host our own IPC subnet, which is adding unknown operational overhead.

I agree that IPC subnets bring a lot of unknown unknowns.

OTOH, we wanted to design the Meridian platform as decentralised as feasible. If we build a multi-tenant centralised solution now, we will need to replace it with a decentralised solution in the future.

I don't know if this is the right time to switch to IPC subnets or Textile Basin; I just want us to make a well-informed decision.

I think with our current team size hosting an IPC subnet is a deal breaker. We're going to do the least amount of work required to platformize meridian-publish, however I will document why subnets are not part of that.