ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.5k stars 491 forks source link

[[OpenSource]] Updating YDB canonical data in Github #4378

Open qrort opened 1 month ago

qrort commented 1 month ago

IMG_9813 At the moment, test canonization works as follows:

When ya make -Z is called inside yandex-team intratet, ya tool writes new canondata in internal storage via MDS protocol and updates URLs in canondata json files accordingly. When a commit is merged to main branch in GitHub, new resources are published into a everyone-access S3 storage, maintaining their IDs. URLs in json files are templated, and it is possible to specify, from which storage canondata should be downloaded.

The goal is to be able to create new canonical data resources from outside yandex-team intranet, and to be able to upload it to a public-access S3 storage.

The steps to achieve that are:

  1. Develop a possibility to write canondata in S3 storage in ya tool, and an option to store canondata in local files.
  2. Be able to generate resource IDs for writing in S3, as MDS creates ID itself, while S3 requires it.
  3. Modify json files to store info "where to load canondata from, local/S3/MDS"
  4. When Github PR is published, new canondata should be supplied to it.
  5. When PR is merged, CI needs to write canondata to a public-access s3. json files should not be modified by that process.
  6. In yandex-team intranet, there should be another S3 storage, which should regularly be synced from a public access one. This storage will be used for release builds.
alexv-smirnov commented 1 month ago
  1. Writing to local files should be an option under S3 protocol, so that we could reuse as much of code as possible.
  2. Ok
  3. results.json format is not to be modified anyhow, as we do not suppose to load canon datafrom MDS, and for various S3 sources there will be the only difference in the host name
  4. They are not to be transferred anyhow, only changes results.json.
  5. As part of a precommit check, there must be a step prior to launching tests, when the CI checks for changes in results.json, and ensure every resource is present in S3. For those which are not, run canonisation with writing to the real S3, as CI has access token with write access.
  6. Ok, to be confirmed with the right team.
qrort commented 1 month ago

Updated task list:

  1. Develop a possibility to write canondata in S3 storage in ya tool, and an option to store canondata in local files (under S3 protocol).
  2. Be able to generate resource IDs for writing in S3, as MDS creates ID itself, while S3 requires it.
  3. When a patch is developed locally, new canondata should be written to local files.
  4. During precommit check for a PR, we should add a step to update canondata, between "Build" and "Test". CI needs to determine a list of tests with new canondata and call ya make -AZ for this list, writing canondata to a public-access S3.
  5. In yandex-team intranet, there should be another S3 storage, which should regularly be synced from a public access one. This storage will be used for release builds.