opensearch-project / opensearch-build

🧰 OpenSearch / OpenSearch-Dashboards Build Systems
Apache License 2.0
136 stars 271 forks source link

[Discussion] Modify input manfiest yml schema and assemble workflow to support pre-installation of native plugins #2849

Open joshpalis opened 1 year ago

joshpalis commented 1 year ago

Is your feature request related to a problem? Please describe

Now that the Job Scheduler project ownership has been transferred to the OpenSearch Core Team, we and the community have made a decision to relocate Job Scheduler to native plugins [1]. This change will have an effect on how the full bundle of OpenSearch is assembled, since multiple plugins (ISM, AD, Reporting) have a dependency on Job Scheduler.

Currently, the input build manifest includes Job Scheduler as a component[2], and this will be removed upon relocation. However, unless there is a mechanism to detect and pre-install native plugin dependencies, assembly of the aforementioned dependent plugins will fail.

[1] https://github.com/opensearch-project/OpenSearch/issues/4218 [2] https://github.com/opensearch-project/opensearch-build/blob/main/manifests/2.4.0/opensearch-2.4.0.yml#L26

Describe the solution you'd like

In order to support the assembly of the Job Scheduler dependent components, I propose modifications to the input manifest yml schema such that native plugins that other components depend on will also be included. Consequently, changes to the input manifest yml schema will necessitate changes to the bundle_OpenSearch assemble workflow, such that native plugins listed within the input manifest are pre-installed prior to the dependent components.

CC : @prudhvigodithi @peterzhuamazon Open to discussion on how best to handle these changes

Describe alternatives you've considered

No response

Additional context

No response

owaiskazi19 commented 1 year ago

Thanks for writing this up @joshpalis. How about including a flag named depends_on for the plugins such as ISM, AD which are dependent on Job Scheduler in the input manifest?

 - name: anomaly-detection
    repository: https://github.com/opensearch-project/anomaly-detection.git
    depends_on: ['Job-Scheduler']
    ref: '2.4'
    platforms:
      - linux
      - windows
    checks:
      - gradle:properties:version
      - gradle:dependencies:opensearch.version

While reading the input manifest we have to take into account the dependency and based on the flag we can either install or not the JS native plugin.

prudhvigodithi commented 1 year ago

Hey @owaiskazi19 something similar to what proposed here ? Another way is to have new schema as following

native_plugins:
- job-scheduler
- repository-s3
components:
  - name: OpenSearch
    repository: https://github.com/opensearch-project/OpenSearch.git
    ref: main
    checks:
      - gradle:publish
      - gradle:properties:version
  - name: common-utils
    repository: https://github.com/opensearch-project/common-utils.git
    ref: main
    platforms:
      - linux
      - windows
    checks:
      - gradle:publish
      - gradle:properties:version

@dblock @bbarani

owaiskazi19 commented 1 year ago

Hey @prudhvigodithi. Looks like we are on the same line with the scheme here.

For the above schema, I don't have any strong opinion(totally fine with it). Only just that it will create a little complex manifest file which can be simplified by using the flag depends_on. Let me know WDYT.

joshpalis commented 1 year ago

Thanks @owaiskazi19 and @prudhvigodithi for your contributions for this discussion. I am partial to Owais' proposal for the modifications to the input manifest yml schema. The modifications would be simplified and it places the onus of responsibility for components to define the plugins that they are dependent on, making it clear which components are dependent on which native plugin.

prudhvigodithi commented 1 year ago

[Triage] Hey @joshpalis what is the targeted release with job-scheduler as native plugin, we should time this right to unblock 3.0.0 development.

joshpalis commented 1 year ago

@prudhvigodithi Currently the targeted release is 2.5.0

bbarani commented 1 year ago

@joshpalis I would recommend to target 2.6.0 release since this is technically a breaking change for the build and development process so we need to synchronize the effort between multiple teams. CC: @CEHENKLE @peterzhuamazon @minalsha

joshpalis commented 1 year ago

Sure, I am fine with targeting 2.6.0 release. I have no strong opinions against this. @minalsha is this alright with you?

minalsha commented 1 year ago

@bbarani we are working closely with folks across different teams. I don't see why we need to push it out. Lets target for 2.5.0 and see where we land on 01/02/2023. cc @CEHENKLE @prudhvigodithi @joshpalis @dagneyb

bbarani commented 1 year ago

@minalsha Is there a need to rush this change soon? This is not a new feature for users rather more of a change to existing plugin installation process. This change needs to be synchronized across multiple versions (3.x and 2.x). The available resources are currently working on 1.3.7 release along with Windows support hence I would not prioritize this for 2.5.0 release. @CEHENKLE @prudhvigodithi @joshpalis @dagneyb @peterzhuamazon @gaiksaya

@saratvemulapalli Will this change be done only for 2.x and 3.x version. I assume 1.x versions are not affected by this change. Can you please confirm?

saratvemulapalli commented 1 year ago

@saratvemulapalli Will this change be done only for 2.x and 3.x version. I assume 1.x versions are not affected by this change. Can you please confirm?

@bbarani you are right. This change will only go out for 2.x and above. This will not impact any 1.x releases. Job Scheduler will still be supported with the same maven co-ordinates for 1.x.

joshpalis commented 1 year ago

@bbarani @prudhvigodithi The relocation of Job Scheduler to native plugins has been determined to be a breaking change due to the necessary build.gradle modifications for dependent plugins. In our efforts to adhere to semantic versioning, we have modified our approach to relocate Job Scheduler to native plugins. Please refer to the updated section of this issue for additional information.

Moving forward, modifications to the input manifest yml schema will still be necessary. Support for the pre-installation of native plugins will only be needed for OpenSearch 3.x and onwards.

CC: @saratvemulapalli @minalsha

gaiksaya commented 1 year ago

[Proposal] Posting one of the approach here:

In order to add JS as a native plugin, a new schema needs to be introduced. OpenSearch consist of a number of native plugins. In this case, we choose to install just one as a part of the distribution which is job scheduler (maybe more in future). Hence the manifest needs to document it.

One way is to have a type as one of the component key in the schema. The type key can be mandatory and the components can be defined accordingly. The following types seems valid:

Each type will define how the component is installed.

The functionality to install components that are of type core-plugin needs to be added to the build system.

From offline discussion looks like the move will be happen from 2.x version but enforced from 3.x due to breaking changes. We can decide if we want the new manifest schema to be introduced from 2.x version. Since build repository does not follow branching strategy to build (maybe it should), for 2.x the type for job-scheduler can be a plugin and then starting 3.0 it can be of type core-plugin

The new input manifest can look like below:

---
schema-version: '1.1'
build:
  name: OpenSearch
  version: 3.0.0
ci:
  image:
    name: opensearchstaging/ci-runner:ci-runner-centos7-opensearch-build-v2
    args: -e JAVA_HOME=/opt/java/openjdk-17
components:
  - name: OpenSearch
    repository: https://github.com/opensearch-project/OpenSearch.git
    ref: main
    type: min
    checks:
      - gradle:publish
      - gradle:properties:version
  - name: common-utils
    repository: https://github.com/opensearch-project/common-utils.git
    type: lib
    ref: main
    platforms:
      - linux
      - windows
    checks:
      - gradle:publish
      - gradle:properties:version
  - name: job-scheduler
    repository: https://github.com/opensearch-project/job-scheduler.git
    type: core-plugin
    ref: main
    platforms:
      - linux
      - windows
    checks:
      - gradle:properties:version
      - gradle:dependencies:opensearch.version
  - name: ml-commons
    repository: https://github.com/opensearch-project/ml-commons.git
    type: plugin
    ref: main
    platforms:
      - linux
      - windows
    checks:
      - gradle:properties:version
      - gradle:dependencies:opensearch.version: opensearch-ml-plugin

@saratvemulapalli @joshpalis @owaiskazi19 @dblock @opensearch-project/engineering-effectiveness let me know what do you think about this. More approaches and recommendations are welcomed. Thanks!

dblock commented 1 year ago

Generally adding a type is a clutch, I don't recommend it. It forks a lot of classes and makes switching on type instead of switching on features.

  1. Does the current system work without changes and a working directory option, even if it means re-building the native plugin twice?
  2. Since the native plugin is already built, can it just be added to published artifacts instead as being declared in the manifest altogether? Existing components publish a plugin zip today, so why can't core also do that? What's missing if along with opensearch-min you get a job scheduler JAR/ZIP?
bbarani commented 1 year ago

@dblock Are you suggesting to integrate JS native plugin installation process outside of distribution build process? In that case, we will always assume that -min contains the job-scheduler (JS) pre-installed and install other plugins?

@saratvemulapalli @minalsha @joshpalis can you add your inputs here?

saratvemulapalli commented 1 year ago

This manifest is used for build and eventually assemble the distribution. Native plugin is a plugin, I dont think we'd want to differentiate between these two for builds.

It changes with installation, the way they are installed. As long as we have right maven co-ordinates to install JS (as a native plugin), wouldn't that solve the problem? @gaiksaya

The proposal also talks about order of execution which is a problem by itself.

prudhvigodithi commented 1 year ago

Hey all just circling back to this issue, thanks for multiple proposals added, @joshpalis just to refresh can you please add the timeline? Like what would be the next step for 2.6? the best chosen proposal has to be pushed before 2.6? As far as I know now the JS plugin is being tested as both as native and as a regular plugin, hence no change needed from build side. Will this strategy be same for 2.6 as well? Thank you

prudhvigodithi commented 1 year ago

Hey @joshpalis just following up, can you please add your thoughts based on my previous message? Thank you

prudhvigodithi commented 1 year ago

The change is targeted for 3.0.0 release. To support, the build process has to be modified to ensure the job-scheduler is installed during assemble workflow. This can be done by installing the zip file from the core-plugins folder (code) and finally having the job-scheduler removed as a component from the manifest file.

  core-plugins:
        - core-plugins/discovery-gce-2.6.0.zip
        - core-plugins/repository-hdfs-2.6.0.zip
        - core-plugins/discovery-ec2-2.6.0.zip
        - core-plugins/analysis-icu-2.6.0.zip
        - core-plugins/discovery-azure-classic-2.6.0.zip
        - core-plugins/ingest-attachment-2.6.0.zip
        - core-plugins/analysis-stempel-2.6.0.zip
        - core-plugins/analysis-phonetic-2.6.0.zip
        - core-plugins/transport-nio-2.6.0.zip
        - core-plugins/repository-s3-2.6.0.zip
        - core-plugins/repository-gcs-2.6.0.zip
        - core-plugins/analysis-ukrainian-2.6.0.zip
        - core-plugins/repository-azure-2.6.0.zip
        - core-plugins/analysis-smartcn-2.6.0.zip
        - core-plugins/mapper-annotated-text-2.6.0.zip
        - core-plugins/store-smb-2.6.0.zip
        - core-plugins/analysis-nori-2.6.0.zip
        - core-plugins/mapper-murmur3-2.6.0.zip
        - core-plugins/mapper-size-2.6.0.zip
        - core-plugins/analysis-kuromoji-2.6.0.zip

The native plugin zips (under core-plugins folder), should also be considered publishing to maven. This gives more options for a user to install the native plugin, currently it pulls form artifacts.opensearch.org.

./bin/opensearch-plugin install repository-s3
-> Installing https://artifacts.opensearch.org/releases/plugins/repository-s3/2.6.0/repository-s3-2.6.0.zip
-> Downloading https://artifacts.opensearch.org/releases/plugins/repository-s3/2.6.0/repository-s3-2.6.0.zip

But this can be an enhancement later and will not impact the installation of JS as a native plugin during assemble as the assemble workflow use the build manifest (the output of the build workflow) and installs the plugins using the zips from the local workspace. @peterzhuamazon @gaiksaya @bbarani @dblock @saratvemulapalli @joshpalis please add if i'm missing anything

Thank you

dblock commented 1 year ago

Do we need to change the manifest at all? Would it be simpler to install whichever native plugins we want as part of install.sh for OpenSearch core?

prudhvigodithi commented 1 year ago

Hey @dblock with the existing flow the install.sh is called during install_plugin that has the logic just to run some cp commands, but the actual plugin installation for opensearch is happening from install_plugin method using bundle_opensearch.py , this internally uses opensearch-plugin cli part of a random tmp folder (example: /tmp/tmpvfijvu7o/opensearch-2.8.0/bin/opensearch-plugin).

So looks to me like install_plugin from bundle_opensearch.py has to be modified with native plugin list coming from a place (could be manifest) and works good.

Tested with repository-s3 : During assemble

2023-06-07 18:47:07 INFO     Installed plugins: ['repository-s3', 'opensearch-job-scheduler']

Inside the final tar file.

opensearch-2.8.0/plugins/repository-s3/
opensearch-2.8.0/plugins/repository-s3/LICENSE.txt
opensearch-2.8.0/plugins/repository-s3/NOTICE.txt
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-core-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-s3-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-sts-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/commons-codec-1.15.jar
opensearch-2.8.0/plugins/repository-s3/commons-logging-1.2.jar
opensearch-2.8.0/plugins/repository-s3/httpclient-4.5.13.jar
opensearch-2.8.0/plugins/repository-s3/httpcore-4.4.15.jar
opensearch-2.8.0/plugins/repository-s3/jackson-annotations-2.15.1.jar
opensearch-2.8.0/plugins/repository-s3/jackson-databind-2.15.1.jar
opensearch-2.8.0/plugins/repository-s3/jaxb-api-2.3.1.jar
opensearch-2.8.0/plugins/repository-s3/jmespath-java-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/log4j-1.2-api-2.17.1.jar
opensearch-2.8.0/plugins/repository-s3/plugin-descriptor.properties
opensearch-2.8.0/plugins/repository-s3/plugin-security.policy
opensearch-2.8.0/plugins/repository-s3/repository-s3-2.8.0.jar

I'm open for any other solutions. @gaiksaya @peterzhuamazon @zelinh please add your thoughts.

Thank you

prudhvigodithi commented 1 year ago

Another approach (first approach in previous comment) we can go with is having a gradle task from OpenSearch core repo that can handle the desired native plugin installation code, and the script install.sh can have this gradle task called which ensures the native plugins are installed. Having this approach the ordering and requirement of right set of native plugins can be directly controlled from OpenSearch core repo. @gaiksaya @dblock @joshpalis @saratvemulapalli

peterzhuamazon commented 1 year ago

Hey @dblock with the existing flow the install.sh is called during install_plugin that has the logic just to run some cp commands, but the actual plugin installation for opensearch is happening from install_plugin method using bundle_opensearch.py , this internally uses opensearch-plugin cli part of a random tmp folder (example: /tmp/tmpvfijvu7o/opensearch-2.8.0/bin/opensearch-plugin).

So looks to me like install_plugin from bundle_opensearch.py has to be modified with native plugin list coming from a place (could be manifest) and works good.

Tested with repository-s3 : During assemble

2023-06-07 18:47:07 INFO     Installed plugins: ['repository-s3', 'opensearch-job-scheduler']

Inside the final tar file.

opensearch-2.8.0/plugins/repository-s3/
opensearch-2.8.0/plugins/repository-s3/LICENSE.txt
opensearch-2.8.0/plugins/repository-s3/NOTICE.txt
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-core-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-s3-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/aws-java-sdk-sts-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/commons-codec-1.15.jar
opensearch-2.8.0/plugins/repository-s3/commons-logging-1.2.jar
opensearch-2.8.0/plugins/repository-s3/httpclient-4.5.13.jar
opensearch-2.8.0/plugins/repository-s3/httpcore-4.4.15.jar
opensearch-2.8.0/plugins/repository-s3/jackson-annotations-2.15.1.jar
opensearch-2.8.0/plugins/repository-s3/jackson-databind-2.15.1.jar
opensearch-2.8.0/plugins/repository-s3/jaxb-api-2.3.1.jar
opensearch-2.8.0/plugins/repository-s3/jmespath-java-1.12.270.jar
opensearch-2.8.0/plugins/repository-s3/log4j-1.2-api-2.17.1.jar
opensearch-2.8.0/plugins/repository-s3/plugin-descriptor.properties
opensearch-2.8.0/plugins/repository-s3/plugin-security.policy
opensearch-2.8.0/plugins/repository-s3/repository-s3-2.8.0.jar

I'm open for any other solutions. @gaiksaya @peterzhuamazon @zelinh please add your thoughts.

Thank you

I am ok with this approach, tho the input manifest needs changes to know the type of the plugins, per @gaiksaya suggestions.

dblock commented 1 year ago

Another approach (first approach in previous comment) we can go with is having a gradle task from OpenSearch core repo that can handle the desired native plugin installation code, and the script install.sh can have this gradle task called which ensures the native plugins are installed. Having this approach the ordering and requirement of right set of native plugins can be directly controlled from OpenSearch core repo. @gaiksaya @dblock @joshpalis @saratvemulapalli

This sounds like a good idea!

prudhvigodithi commented 1 year ago

Hey @joshpalis coming form your comment https://github.com/opensearch-project/OpenSearch/issues/5310#issuecomment-1597518680, is the JS native plugin migration targeted to 3.0.0 release?

joshpalis commented 1 year ago

Another approach (first approach in previous https://github.com/opensearch-project/opensearch-build/issues/2849#issuecomment-1581647805) we can go with is having a gradle task from OpenSearch core repo that can handle the desired native plugin installation code, and the script install.sh can have this gradle task called which ensures the native plugins are installed. Having this approach the ordering and requirement of right set of native plugins can be directly controlled from OpenSearch core repo. @gaiksaya @dblock @joshpalis @saratvemulapalli

I support this approach, as it negates the need to modify the input manifest to add a type to differentiate plugins and native plugins. As @saratvemulapalli stated, they should be treated the same.

prudhvigodithi commented 1 year ago

Thanks Josh, in that case can you please coordinate from the core side to on board a task that can install the native plugins and then we can come back to the build side and update the install.sh file with that gradle task ? Thank you

bbarani commented 1 year ago

@prudhvigodithi will look in this issue and see if it can be integrated to Gradle build for 3.x version.

prudhvigodithi commented 7 months ago

This issue remains on hold as the proposal and implementation for migrating job-scheduler as native plugin is still under review, also this migration is a breaking change targeted to 3.0.0 release which is moved to Feb 18 2025 based on the release schedule https://github.com/opensearch-project/.github/issues/186.

Its also worth exploring using of core PersistentTaskPlugin with adding scheduler capabilities and allow plugins use the core feature instead of job-scheduler. Adding @peterzhuamazon https://github.com/opensearch-project/job-scheduler/issues/147#issuecomment-1919192347.

Thanks