verifier-alliance / pm

Project management for the Verifier Alliance collective
0 stars 0 forks source link

Database Issues and Proposed Changes #4

Open kuzdogan opened 2 months ago

kuzdogan commented 2 months ago

As laid out in the Notion doc we found issues in the existing production VerA data. The general consensus was to fix the issues and recreate the database from scratch. In addition other changes were proposed to the database schema.

Action Items

Issues Overview

Other changes proposed

rimrakhimov commented 2 months ago

The inconsistency between code.code_hash and code.code should be impossible after https://github.com/verifier-alliance/database-specs/pull/4 After that, I believe we may not add created_by and updated_by fields to the code and contracts tables as their consistency would be guaranteed on the database level. PR still requires to provide that code_hash value on the insertion, but the value would be validated by the database

I'm still for the adding created_by, created_at, updated_by, and updated_at fields to the contract_deployments

rimrakhimov commented 2 months ago

Some thoughts regarding the transformations and values:

  1. According to the schema I believe transformations must be arrays and values must be objects. Most probably, we may ensure that distinction on the database level. Should we?
  2. I believe, though, it make sense to distinguish cases when there is a match but no transformations and when there is no match at all. In the second case, blockscout currently uses postgres NULL values to indicate that there is no meanings in those fields as corresponding runtime_match or creation_match value is false. Should we agree on that or use something else?
  3. For the values, we have a list of supported keys: cbor_auxdata, constructor_arguments, libraries, and immutables. They are not required by the schema, but just to make it explicit. a) What if the corresponding value is empty (e.g., Solidity contract without immutables); b) What if the corresponding value is not supported by the compiler at all (e.g., there is no immutables notion in Vyper); Should we omit those keys from json and not use just defaults ({} for maps and empty string for constructor arguments)?
rimrakhimov commented 2 months ago

Btw, should we also define the json schema for compiled_contracts.compilation_artifacts,compiled_contracts.compilation_artifactsandcompiled_contracts.creation_code_artifacts`? I believe they are missing now from https://github.com/verifier-alliance/database-specs/tree/master/json-schemas

@samczsun @kuzdogan @marcocastignoli

kuzdogan commented 1 month ago

The inconsistency between code.code_hash and code.code should be impossible after verifier-alliance/database-specs#4 After that, I believe we may not add created_by and updated_by fields to the code and contracts tables as their consistency would be guaranteed on the database level. PR still requires to provide that code_hash value on the insertion, but the value would be validated by the database

I'm still for the adding created_by, created_at, updated_by, and updated_at fields to the contract_deployments

I'm not sure if I understand why you don't want to add created_by and updated_by fields to code.

In our case we are using the Postgres digest() function to pass the sha value directly in the INSERT.

https://github.com/ethereum/sourcify/blob/c7f00c38260ac514f69d6df6a826521001c47d97/services/server/src/server/services/utils/database-util.ts#L194

  1. According to the schema I believe transformations must be arrays and values must be objects. Most probably, we may ensure that distinction on the database level. Should we?

Yes but is there a different way to do it (like json and array type?)? or can we not just define it in the JSON schema?

  1. I believe, though, it make sense to distinguish cases when there is a match but no transformations and when there is no match at all. In the second case, blockscout currently uses postgres NULL values to indicate that there is no meanings in those fields as corresponding runtime_match or creation_match value is false. Should we agree on that or use something else?

Agreed. So the runtime transformations and values will be NULL if runtime_match == false, and same for the creation. Can we maybe encode this on the db level again?

  1. For the values, we have a list of supported keys: cbor_auxdata, constructor_arguments, libraries, and immutables. They are not required by the schema, but just to make it explicit. a) What if the corresponding value is empty (e.g., Solidity contract without immutables); b) What if the corresponding value is not supported by the compiler at all (e.g., there is no immutables notion in Vyper); Should we omit those keys from json and not use just defaults ({} for maps and empty string for constructor arguments)? a) if there are no "immutables" then the JSON will not contain an immutables field. Isn't it already like that? b) again here, if the contract is a Vyper contract, then no immutables field is expected in the JSON. So I don't expect a Vyper contract to have a "immutables" : {} field in the transformations object.

Btw, should we also define the json schema for compiled_contracts.compilation_artifacts,compiled_contracts.compilation_artifactsandcompiled_contracts.creation_code_artifacts`? I believe they are missing now from https://github.com/verifier-alliance/database-specs/tree/master/json-schemas

@samczsun @kuzdogan @marcocastignoli

I brought it up in the chat but forgot to follow up. I'm not sure about these fields. Currently these fields are HUGE and most of the data is actually useless. My computer literally freezes when trying to view the contents of these fields. We don't seem to have a standard here and everyone's inserting something else.

IMO here we need to be intentional. Even if we wanted the storageLayout as @samczsun suggests, right now we don't have it unless it is stated in the compiler settings. Here, we are just writing the outputs from the "outputSelection": { field. Some contracts have a minimal output like just abi, bytecodes etc. Some have full outputs "*": { "*": { "*"}} and this is an immense size of an output.

Here we should make the distinction between what would be useful to have in hand for the tooling ready, and what we can delegate to a recompilation to generate the desired output. Like, do we want the sourceMap, ast (I suspect, it's huge, but maybe it's useful, idk?), methodIdentifiers and so on. I'm not very versed in tooling and what might be useful so we should maybe ask around a little?

In this case each verifier needs to set the outputSelection themselves different than what's set in the compiler settings by the user. But there are things like https://github.com/ethereum/solidity/issues/13985 that we might have to pay attention to.

rimrakhimov commented 1 month ago

I'm not sure if I understand why you don't want to add created_by and updated_by fields to code.

It was just a suggestion, as I believe there is no much sense in it. The code -> code_hash correspondence would be guaranteed due to the database constraint, and the only field which may be wrong is code_hash_keccak which is mostly descriptive. But if we still would like to know who has written those keccak values, let's add created_by, updated_by columns as well

Yes but is there a different way to do it (like json and array type?)? or can we not just define it in the JSON schema?

I didn't find a way to add the json schema constraint in the database. But we still may manually add the most basic checks, I believe. For the transformations that may look like

ALTER table verified_contracts
ADD CONSTRAINT runtime_transformations_is_array
CHECK (runtime_transformations IS NULL or jsonb_typeof(runtime_transformations) = 'array');

I don't know how it would affect the performance, though

Agreed. So the runtime transformations and values will be NULL if runtime_match == false, and same for the creation. Can we maybe encode this on the db level again?

I believe we should. I will add the corresponding constraints in a separate PR

I'm not very versed in tooling and what might be useful so we should maybe ask around a little?

I agree. For us, there is abi which we currently use, and some fields we are planning to start using - sourceMap and sources.{file_name}.id

rimrakhimov commented 1 month ago

@kuzdogan Added a new PR with suggested verified_contracts constraints - https://github.com/verifier-alliance/database-specs/pull/5/files

kuzdogan commented 1 month ago

It was just a suggestion, as I believe there is no much sense in it. The code -> code_hash correspondence would be guaranteed due to the database constraint, and the only field which may be wrong is code_hash_keccak which is mostly descriptive. But if we still would like to know who has written those keccak values, let's add created_by, updated_by columns as well

I still see value in adding the field. The added extra data is trivial IMO and can be really useful tracing bugs.

Could you please also add this specification to the schema?

I didn't find a way to add the json schema constraint in the database. But we still may manually add the most basic checks, I believe. For the transformations that may look like

ALTER table verified_contracts
ADD CONSTRAINT runtime_transformations_is_array
CHECK (runtime_transformations IS NULL or jsonb_typeof(runtime_transformations) = 'array');

I don't know how it would affect the performance, though

If that's all we can do let's move with this. But is there a way to somehow check the hex format? I guess these are in our tests (are they?) and we need to get everyone to run them.

Actually not only this but for all of the stuff, are tests in place or Blockscout and Sourcify are the only ones "running" them?

I agree. For us, there is abi which we currently use, and some fields we are planning to start using - sourceMap and sources.{file_name}.id

I'll move this discussion to the group for Sam to see

rimrakhimov commented 1 month ago

I still see value in adding the field. The added extra data is trivial IMO and can be really useful tracing bugs.

Could you please also add this specification to the schema?

https://github.com/verifier-alliance/database-specs/pull/6