majkinetor / au

Chocolatey Automatic Package Updater Module
GNU General Public License v2.0
227 stars 71 forks source link

Processing certain packages sequentially (preventing parallelism) #222

Closed jberezanski closed 3 years ago

jberezanski commented 3 years ago

Consider this scenario:

This approach is very simple to implement and works well when AU is set to use one thread. If multiple threads are used, we have a classic race without synchronization and sometimes we get failures due to two package scripts attempting to update the cache file at the same time.

I can see a couple of ways to work around this, but at the cost of increased complexity of package update scripts. Is there perhaps a facility in AU that could help with that? Such as a way to specify that "package A and B should not be processed in parallel"? Or a facility to invoke some logic once per update_all run and return the same results when called from each package processed by AU in this run?

jberezanski commented 3 years ago

Cross-reference: https://github.com/darinegan/chocolatey-packages/pull/7#issuecomment-707844388

majkinetor commented 3 years ago

There is no such thing in AU. I was thinking about introducing dependencies between packages which was mostly needed for virtual packages, but was not only about parallelism but also atomicity - if for example install package fails it would also forbid virtual one that depends on it to be published.

However, it can certainly be done now - for example keep 2 folders, one that should be invoked in parallel and one sequentially. You could have 2 update_all scripts with different settings or just one and specify individual package updaters any way you like in the second. You could also keep this in single folder without much problem.

So

so that it does not change between packages if the vendor releases a new version at that exact moment

Although theoretically possible, the chances for this are next to 0. Did you actually experience it ?

Or a facility to invoke some logic once per update_all run and return the same results when called from each package processed by AU in this run?

You could do this via before/after each (options of updateall); Before/after all is not needed as it can be simply added in the update_all.ps1 script before and after updateall call

jberezanski commented 3 years ago

Thanks for the ideas, I have some more options to choose from now.

Atomicity would definitely be useful too (although from my experience issuses are more likely to happen due to moderation on chocolatey.org - if one package would get approved, but its dependency not yet - than due to errors during AU run).

Although theoretically possible, the chances for this are next to 0. Did you actually experience it ?

Given a large number of packages and versions, even unlikely scenarios may happen eventually. I remember suspecting that to be the cause of a certain failure with the dotnetcore packages. But mostly I just want to play it safe, because such an occurrence for high profile packages could inconvenience a rather large group of people. Doing just a single network call also increases the reliability of the process, because there is less chance of running into transient network issues compared to, for example, making that call for each of some 30-40 packages.

jberezanski commented 3 years ago

(As far as I'm concerned, the issue can be closed, unless you wish to leave it open as a possible future idea.)

majkinetor commented 3 years ago

issuses are more likely to happen due to moderation on chocolatey.org

Good point.

Doing just a single network call also increases the reliability of the process, because there is less chance of running into transient network issues compared to, for example, making that call for each of some 30-40 packages.

Could you give me example ?

jberezanski commented 3 years ago

Well, the network is inherently unreliable. Each download (of, say, a json or html with the latest release info) has a low (usually), but nonzero, chance of failing. If there are, say, 10 packages*, each using the same version source and downloading the info independently, the chance of encountering a failure (considering the whole process) is higher than the chance of a single download failing. Yes, the chance is still small, but it is bound to happen sooner or later and may result in some packages not getting updated, causing dependency woes.

*10 is the rough estimate of the number of dotnet packages after the planned restructuring; when I wrote "30-40" above I was thinking of an internal project.

And I just remembered encountering the case where the vendor would update the latest version info while I was testing the Visual Studio packages (although this was not in AU context).

majkinetor commented 3 years ago

I understand your motivation, I was asking for real life example.

Do I understand you right ? - you say that N packages download exactly the same data (like some json database) to get latest info and you want to avoid that by downloading it only once and reusing cached version for all N packages.

jberezanski commented 3 years ago

Yes, precisely.

Examples:

majkinetor commented 3 years ago

OK, how about something like this:

  1. In update_all.ps1 add near the top ls $Root | % { if (Test-Path $_\beforeall.ps1) { . $_\beforeall.ps1 } }
  2. In one of the packages of the group put beforeall.ps1 and download the resource ones
  3. Update all package updater's $releases url to use local file.

For example

dotnetcore\beforeall.ps1:

$releases = "https://github.com/dotnet/core/blob/master/release-notes/5.0/releases.json" 
Invoke-WebRequest $releases | Out-File ..\_cache\dotnetcore-releases.json`

dotnetcore\update.ps1 and other

$releases = "...\_cache\dotnetcore-releases.json"

au_GetLatest {
        # Invoke-WebRequest $releases  
        $r = Get-Content $releases | ConvertFrom-Json
        ....

You can make this without changes to script files I think by using Before/AfterEach in update_all options.

You can make this arbitrary complex with features.

jberezanski commented 3 years ago

This will work for update_all.ps1, but loses the ability to invoke any package's update.ps1 on its own (because it would always use the cached, possibly stale data), which is sometimes useful. I will probably combine it with checking the cache timestamp and redownloading if needed (if it is, say, older than one hour - much longer than an update_all run would take) also in update.ps1, that way both scenarios should work.

jberezanski commented 3 years ago

I like the idea of beforeall.ps1; it keeps concerns of individual packages out of update_all.ps1 and is an elegant, modular approach.

majkinetor commented 3 years ago

but loses the ability to invoke any package's update.ps1

Easily solvable

update.ps1

$releases = . beforeall.ps1 -Standalone

then in beforeall.ps1 something like:

param ([switch] $Standalone)

$releases = $Standalone ? "remote location" : "cached location" 

if ($Standalone ) { $releases; return }

Invoke-WebRequest $releases | Out-File ..\_cache\dotnetcore-releases.json

In update.ps1 you could use some function Get-Data that will take data either via iwr or gc, then your updaters static. Or you can use file:// urls for cache and keep iwr in all cases.

I believe with some thought and convention there would be actually zero changes required in scripts, it could all be driven externally.

jberezanski commented 3 years ago

Here is what I did for Service Fabric (https://github.com/darinegan/chocolatey-packages/pull/7):

update_all.ps1:

Get-ChildItem -Path $Root -Directory | Join-Path -ChildPath 'beforeall.ps1' | Get-Item -ErrorAction SilentlyContinue | ForEach-Object {
    Write-Host "Running beforeall script for $(Split-Path -Leaf -Path $_.Directory)"
    & $_
}

beforeall.ps1 in one of the packages:

Import-Module $PSScriptRoot\..\..\tools\PSModules\ServiceFabricPackageTools\ServiceFabricPackageTools.psm1
Get-FabricUpdateInfo -IgnoreCache | Out-Null

No change in update.ps1 - it calls Get-FabricUpdateInfo, as before.

Get-FabricUpdateInfo checks the cache file timestamp and redownloads the update info if older than one hour or if the -IgnoreCache switch is set.

majkinetor commented 3 years ago

Looks great.