Closed fjmilens3 closed 6 years ago
@TheBay0r, when you're testing this, are you modifying the file in any way between uploads? (For example, are you reuploading the exact same file without any changes at all, or are you changing it in some way before reuploading?)
@fjmilens3 I wasn't changing the files in between. Will try to test the case where the content of the zip is slightly changed if that has an impact on the JSON generated.
@fjmilens3 So I tested this case. When the zip file contains a change it seems that the json is rebuilt
@TheBay0r:
So I tested this case. When the zip file contains a change it seems that the json is rebuilt
This is something we're going to have to live with because of other factors related to suppressing unneeded rebuilds, but fortunately shouldn't be that much of a problem.
Explanation
Every time a new upload comes in for an existing artifact, we generate an update event within the data layer:
We then receive this event and if certain conditions hold, we use that to generate a rebuild event for the metadata:
However, since these are database-level events, other events, notably downloads, can also cause the same record to be updated (as we have to increment the last downloaded time, etc.).
We don't want to rebuild metadata in this event, and until we have a better application-level solution, our preferred workaround for this problem is to see whether or not the blob was updated within a short period of time.
If it has, we assume that it was the result of the blob changing, and if has not, we infer that it was the result of a download (or other operation) that touched the asset but should not force a rebuild of any associated metadata:
Along with the above, Nexus Repository Manager tries to deduplicate blobs (for the same asset), such that if we receive a blob for an asset that's identical to the blob we already have for that asset, we don't have any churn at the storage level. As a practical matter, that means that the blob was not updated, so the blob updated timestamp is not updated either:
The end result being that you won't have any metadata rebuilt because the blob has not changed. Of course, there are changes within the repository manager itself that could be used to handle this, or we could broadcast custom events from within the content facet; however, I'm trying to minimize the divergence between the approach we have here and the approach we have in our supported/proprietary format implementations.
Conclusions
Under normal circumstances this won't matter as if the blob hasn't changed then the generated metadata would not change (at least not in any meaningful way) either, as the metadata is extracted from the content of the blob (in our case, the composer.json
file in the archive).
However, under unusual circumstances it can be advantageous to have a scheduled task within Nexus that can rebuild metadata for all or part of a repository's contents. This is typically useful either to mitigate the effects of some breaking change or to recover from some unexpected situation where the generated metadata is inaccurate or incomplete (a special case of which would be the scenario you first encountered, where I'd made breaking changes with the metadata generation in that PR, but you didn't see the metadata regenerate by reuploading the same artifact).
If you have no objections (and are satisfied with the above explanation), I would like to consider this "closed" and implement the aforementioned scheduled task in https://github.com/sonatype-nexus-community/nexus-repository-composer/issues/21 before we someday promote this to 1.0.0
.
Ah, wow! Thank you for the detailed explanation. My naive approach to this was that it just would be hooked up to the post request and whenever a post request comes in an update is triggered no matter what 🤔 But this approach makes sense of course! 🙂
From my point of view this one can be closed, thanks.
Closing based on conversation with @TheBay0r.
@TheBay0r has noticed that there may be issues with regenerating provider-level metadata when uploading a new build of an existing package and version. Details are part of the comments thread in PR #14 and would have entered the codebase as part of PR #18.