osbuild / image-builder

Image Builder service for console.redhat.com
Apache License 2.0
49 stars 44 forks source link

v1/api: Add exported custom repositories #1381

Open avitova opened 1 week ago

avitova commented 1 week ago

Currently, the custom_repositories field only stores ids. When exporting a blueprint and sharing with someone from different org, we can not reuse ids, because from different org, users would not be able to see it. For that reason, we need to export more details about the repositories on export.

https://issues.redhat.com/browse/HMS-4811

Edit: There is a very long discussion, the summary is in this comment (https://github.com/osbuild/image-builder/pull/1381#issuecomment-2451696913)

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 52.17391% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/v1/handler_blueprints.go 51.51% 11 Missing and 5 partials :warning:
internal/clients/content_sources/client.go 53.84% 4 Missing and 2 partials :warning:
Files with missing lines Coverage Δ
internal/v1/api.go 49.02% <ø> (ø)
internal/clients/content_sources/client.go 67.18% <53.84%> (-3.41%) :arrow_down:
internal/v1/handler_blueprints.go 71.60% <51.51%> (-1.42%) :arrow_down:
croissanne commented 1 week ago

I don't fully understand why we're adding this? Could you add that in the commit message?

avitova commented 1 week ago

Thanks for all the comments. I am happy with all the ideas and will think about them. I edited the description of the pull request. Hopefully, it is clearer now why we need this. @croissanne

avitova commented 1 week ago

I ended up thinking a lot about what Lukáš suggested about keeping just the json structure as is. I did a quick check here only to make sure that we do not include values that are not needed. So, the field would not be required; in case we actually get some values that make sense from content sources, we add them to the structure.

avitova commented 1 week ago

/retest

croissanne commented 1 week ago

@avitova here's an example of what's in the DB https://gist.github.com/croissanne/bad023411f3bb65d7e7e391427eeb43c

avitova commented 6 days ago

Thank you for the comment, Sanne. I misinterpreted the field during the documentation, so I changed that. However, what we were trying to solve here was the ability to import custom repositories fully. Two scenarios can happen if we export it as-is and later import it. This is due to the fact that for each URL, only one record can exist: 1) We import, the customer does not have custom repo with the given URL, so whatever we create does not influence the customer as much. 2) The user already has a repo with a given URL but has some restrictions for architecture or versions, probably for a reason. Here, we wanted to keep it as non-intrusive as possible. The import endpoint should modify the existing repo, which is why we can not use simply bulk_create endpoint. tor to make changes that modify the custom repo as little as possible, we need the information such as distribution_arch, distribution_versions, etc. Are these saved in our DB? Maybe I am missing something.

croissanne commented 6 days ago

Thank you for the comment, Sanne. I misinterpreted the field during the documentation, so I changed that. However, what we were trying to solve here was the ability to import custom repositories fully. Two scenarios can happen if we export it as-is and later import it. This is due to the fact that for each URL, only one record can exist:

1. We import, the customer does not have custom repo with the given URL, so whatever we create does not influence the customer as much.

2. The user already has a repo with a given URL but has some restrictions for architecture or versions, probably for a reason. Here, we wanted to keep it as non-intrusive as possible. The import endpoint should modify the existing repo, which is why we can not use simply bulk_create endpoint. tor to make changes that modify the custom repo as little as possible, we need the information such as distribution_arch, distribution_versions, etc. Are these saved in our DB? Maybe I am missing something.

These are indeed missing, but maybe we can just take the arch / distro from the other parameters in the blueprint being imported? If we could just infer to be close enough, and then I think they should be able to edit it after.

lzap commented 5 days ago

These are indeed missing, but maybe we can just take the arch / distro from the other parameters in the blueprint being imported? If we could just infer to be close enough, and then I think they should be able to edit it after.

For what benefits tho?

Exporting everything maps nicely, in fact, this is a generic approach we can use for all of other services. Who knows what other fields we will not be able to infer. This is just a blob that stores next to the blueprint, it is not being stored in the image-builder database so there is nothing to worry about in terms of data duplication.

croissanne commented 5 days ago

These are indeed missing, but maybe we can just take the arch / distro from the other parameters in the blueprint being imported? If we could just infer to be close enough, and then I think they should be able to edit it after.

For what benefits tho?

Exporting everything maps nicely, in fact, this is a generic approach we can use for all of other services. Who knows what other fields we will not be able to infer. This is just a blob that stores next to the blueprint, it is not being stored in the image-builder database so there is nothing to worry about in terms of data duplication.

Will repositories fail to import if the blob is not there? I guess we'd only support blueprints exported with the ib api? But right maybe my idea had too many heuristics.

lzap commented 5 days ago

Will repositories fail to import if the blob is not there? I guess we'd only support blueprints exported with the ib api?

The idea behind this "special" blueprint export / import endpoint is that customers can clone self-contained blueprints. So the workflow is: this export endpoint -> JSON file -> new import endpoint (TBD). With the idea of creating some curated JSON files as an experiment for Red Hat supported blueprints.

So yes, this is a special export just for this feature. Tho hopefully if we are able to find a common blueprint schema at some point and we would go for my idea of "extensions" this extra data could live exactly there.

Alternatively, we could create a new field in customizations-custom_repositories that could carry the extra information (UUID). Have you explored that before, @avitova ?

avitova commented 3 days ago

There has been a very long discussion, and I feel like we might get into a loop very soon. I will quickly summarize the purpose of this PR, as well as explain my choices here. I adjusted the PR based on some of yours opinions, make sure what you think now. A summary of what was going on:

Why is this PR even a thing?

If a Person A from Org A exports their Blueprint A, provide the exported file of it to a Person B from Org B, we want to create as much of a true copy of Blueprint A on import, as possible. Now, Custom repositories are specific to an org, so repos Org A has can be different than what Org B has. => We want to have a way to import, and also handle mismatches in case Org B has the repo, but with different restrictions.

Why did I decide to use the endpoint and not use existing data in the blueprint?

So, the Customizations.CustomRepositories do not have some of the fields - such as restriction for OS and restriction for architecture. As Sanne said, we could inherit that from the rest of the blueprint - but that just does not give us the full information, we are kind of "guessing" what is there. If we are okay with that, we could do this. But then also, as Lukas said, there could be more fields that we are unable to "guess" or that would make it complicated. Also, as Ondra said in one of the discussions, the schema for custom repositories could change. That is one of the reasons why having the blop is more beneficial than my previous solution, which was creating a new structure. (Lukas suggested this too)

Now, why did I create a separate field and did not save it in Customizations?

Well, because we will not use these data anywhere else than in export, and if we change Customizations, the field will potentially be everywhere, and might even be misused. Keeping it only in BlueprintExportResponse, sounds safe.

avitova commented 3 days ago

Okay, I managed to figure the tests out, Lukas:)

avitova commented 5 hours ago

Ready for review:)