Open breml opened 3 years ago
The current approach uses a very minimal scoring method, which includes having the repo's name in the binary name or URL's basename. This should already give priority to files that at least include the repo name. However, the other suggestions sound interesting, and especially looking at FileInfo()
in .tar and .zip files and other archives that support the executable ti sounds like a quick win.
Finding binaries by reading it's MIME type would be awesome, every time I update a binary I've to manually select it between ~10-20 files.
Is this open to contributions?
Is this open to contributions?
Defintely!
For updating binaries I believe there's actually something better we can do. We could save the name of the original file selected the first time and then whenever triggering an update, check if that same file exists to fetch it without asking the user again. Additionally, I suggest we can do what @sirlatrom suggests about improving the scoring method to target the files better to remove altogether the selection step.
I guess we can implement this in several steps
- Save the selected file the first time and use the same name upon updates.
This will help a lot as it's difficult to check out the same binary when updating. Sometimes I have ended up downloading the checkgen instead of the executable.
- Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
- Save the selected file the first time and use the same name upon updates.
The first point is already handled for .tar(.*) and .zip archives as the same filtering/selection mechanism is used there as for 'top level' files/assets.
I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?
I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?
Hmm maybe I missing something here? What I had in mind is:
Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.
- When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.
Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.
@marcosnils Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.
Practically speaking, we should at least remember which top level asset was chosen, and if it's an archive then which file was chosen within that archive.
I guess we can implement this in several steps
- Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
I would like to emphasize once again, that I do not like the scoring part about OS and Arch. There is really no value in ever presenting the user a file, that does not match the OS or the Arch, even if such a file has the highest score, for example if there is no file available for the OS/Arch of the user. I had this situation once, where bin installed a Windows exe on my Linux and in my opinion, this should never happen. So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.
So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.
I agree with this approach. I believe we're on the same page here and we're mostly discussing semantics. Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).
Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.
Now I understood your original concern. I guess we can save all the file chain, doesn't seem very difficult to do. However, I still haven't come across a scenario with multiple zipped files to ultimately get a binary. Not sure how often this becomes in practice, since it's not very standard right?
Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).
Currently, any asset containing the repo name gets a score of 1 to begin with, and additional points for matching the OS/arch/OS specific extension (.exe/.appimage). I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?
Not sure how often this becomes in practice, since it's not very standard right?
Agreed. We would still need to save two choices, though: Which archive, and which binary within the archive.
I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?
My basic scoring proposal:
bin
host, subtract -1bin
host, add +1Given scores:
I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.
* match the `bin` host
What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.
Ok referring to bin host OS and Arch. I'm aware that were doing some of those things already, I was just describing how I see the overall algorithm working
sent from mobile
Em qua, 31 de mar de 2021 22:46, Sune Keller @.***> escreveu:
- match the
bin
hostWhat does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcosnils/bin/issues/67#issuecomment-811578161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMBLWTLDJNKLWNVBQB3C3TTGPGAPANCNFSM4ZQL5TZQ .
I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?
My basic scoring proposal:
- All files start with score 0
- If file has arch and/or OS and it doesn't match the
bin
host, subtract -1- If file has arch and/or OS and does match the
bin
host, add +1Given scores:
- Single high score file, install automatically
- Multiple score files => 0, prompt the user order by score desc
- Files with score < 0 don't prompt the user
I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.
In general I like the above proposal. One downside I see is, that a file with the correct os, but the wrong arch will still get a score of 0 (+1 -1) and therefore this file remains a candidate. So I guess in order for a file to be considered a candidate, it must achieve at least a score > 0.
Additionally I would like to work towards an algorithm, that is successful in most cases to pick an archive and perform a successful installation and only in very few exceptional cases, it should be necessary for the user to select an archive. One step into this direction would be to put the different archive types into an ordered list (ordered by priority). This would allow us to successfully install the binary even if there are multiple archive types available (e.g. tar.gz and .zip).
I have an additional idea, which I feel worth exploring and this idea is to check, if the repo does contain a .goreleaser.yml
file. I know, this targets only towards Go, but I feel that goreleaser is becoming the defacto standard for releasing binaries in the Go eco system. The hugh advantage of considering this file is, that we no longer need to guess if arch / os are present in the file name, because based on the existence of the replacement
section, we know which is the correct file to download.
Example from bin:
archives:
- replacements:
darwin: Darwin
linux: Linux
windows: Windows
386: i386
amd64: x86_64
It might be worth it to try to figure out, if there is something similar for e.g. Rust.
I did a quick test with my ~50 binaries managed with bin. For a little bit more than 1/3, I found a .goreleaser.yml
.
Just for reference, this site lists the valid combinations of arch/os supported by the Go compiler: https://gist.github.com/asukakenji/f15ba7e588ac42795f421b48b8aede63
I'd like to contribute some more examples test cases that could affect this issue:
tar
tgz
(.tgz.sha256
is unlikely, see #100)tar
This issue might overlap with #102.
I'd like to add the case where there are alternate binaries matching your platform, such as:
When presenting the user a list of potential files to select the correct binary from, the following improvements could be applied to improve the user experience:
.exe
,.sh
and of course no extension)FileInfo()
information from header of archive (e.g. in tar and zip) to find files with executable bit set