marcosnils / bin

Effortless binary manager
MIT License
708 stars 47 forks source link

Feature Request: Optimized select for selecting the binary #67

Open breml opened 3 years ago

breml commented 3 years ago

When presenting the user a list of potential files to select the correct binary from, the following improvements could be applied to improve the user experience:

sirlatrom commented 3 years ago

The current approach uses a very minimal scoring method, which includes having the repo's name in the binary name or URL's basename. This should already give priority to files that at least include the repo name. However, the other suggestions sound interesting, and especially looking at FileInfo() in .tar and .zip files and other archives that support the executable ti sounds like a quick win.

cristiand391 commented 3 years ago

Finding binaries by reading it's MIME type would be awesome, every time I update a binary I've to manually select it between ~10-20 files.

Is this open to contributions?

marcosnils commented 3 years ago

Is this open to contributions?

Defintely!

For updating binaries I believe there's actually something better we can do. We could save the name of the original file selected the first time and then whenever triggering an update, check if that same file exists to fetch it without asking the user again. Additionally, I suggest we can do what @sirlatrom suggests about improving the scoring method to target the files better to remove altogether the selection step.

I guess we can implement this in several steps

akhan4u commented 3 years ago
  • Save the selected file the first time and use the same name upon updates.

This will help a lot as it's difficult to check out the same binary when updating. Sometimes I have ended up downloading the checkgen instead of the executable.

sirlatrom commented 3 years ago
  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
  • Save the selected file the first time and use the same name upon updates.

The first point is already handled for .tar(.*) and .zip archives as the same filtering/selection mechanism is used there as for 'top level' files/assets.

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

marcosnils commented 3 years ago

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

Hmm maybe I missing something here? What I had in mind is:

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

sirlatrom commented 3 years ago
  • When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

@marcosnils Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Practically speaking, we should at least remember which top level asset was chosen, and if it's an archive then which file was chosen within that archive.

breml commented 3 years ago

I guess we can implement this in several steps

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file

I would like to emphasize once again, that I do not like the scoring part about OS and Arch. There is really no value in ever presenting the user a file, that does not match the OS or the Arch, even if such a file has the highest score, for example if there is no file available for the OS/Arch of the user. I had this situation once, where bin installed a Windows exe on my Linux and in my opinion, this should never happen. So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

marcosnils commented 3 years ago

So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

I agree with this approach. I believe we're on the same page here and we're mostly discussing semantics. Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Now I understood your original concern. I guess we can save all the file chain, doesn't seem very difficult to do. However, I still haven't come across a scenario with multiple zipped files to ultimately get a binary. Not sure how often this becomes in practice, since it's not very standard right?

sirlatrom commented 3 years ago

Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Currently, any asset containing the repo name gets a score of 1 to begin with, and additional points for matching the OS/arch/OS specific extension (.exe/.appimage). I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

Not sure how often this becomes in practice, since it's not very standard right?

Agreed. We would still need to save two choices, though: Which archive, and which binary within the archive.

marcosnils commented 3 years ago

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

Given scores:

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

sirlatrom commented 3 years ago
* match the `bin` host

What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.

marcosnils commented 3 years ago

Ok referring to bin host OS and Arch. I'm aware that were doing some of those things already, I was just describing how I see the overall algorithm working

sent from mobile

Em qua, 31 de mar de 2021 22:46, Sune Keller @.***> escreveu:

  • match the bin host

What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcosnils/bin/issues/67#issuecomment-811578161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMBLWTLDJNKLWNVBQB3C3TTGPGAPANCNFSM4ZQL5TZQ .

breml commented 3 years ago

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

  • All files start with score 0
  • If file has arch and/or OS and it doesn't match the bin host, subtract -1
  • If file has arch and/or OS and does match the bin host, add +1

Given scores:

  • Single high score file, install automatically
  • Multiple score files => 0, prompt the user order by score desc
  • Files with score < 0 don't prompt the user

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

In general I like the above proposal. One downside I see is, that a file with the correct os, but the wrong arch will still get a score of 0 (+1 -1) and therefore this file remains a candidate. So I guess in order for a file to be considered a candidate, it must achieve at least a score > 0.

Additionally I would like to work towards an algorithm, that is successful in most cases to pick an archive and perform a successful installation and only in very few exceptional cases, it should be necessary for the user to select an archive. One step into this direction would be to put the different archive types into an ordered list (ordered by priority). This would allow us to successfully install the binary even if there are multiple archive types available (e.g. tar.gz and .zip).

I have an additional idea, which I feel worth exploring and this idea is to check, if the repo does contain a .goreleaser.yml file. I know, this targets only towards Go, but I feel that goreleaser is becoming the defacto standard for releasing binaries in the Go eco system. The hugh advantage of considering this file is, that we no longer need to guess if arch / os are present in the file name, because based on the existence of the replacement section, we know which is the correct file to download.

Example from bin:

archives:
- replacements:
    darwin: Darwin
    linux: Linux
    windows: Windows
    386: i386
    amd64: x86_64

It might be worth it to try to figure out, if there is something similar for e.g. Rust.

breml commented 3 years ago

I did a quick test with my ~50 binaries managed with bin. For a little bit more than 1/3, I found a .goreleaser.yml.

breml commented 3 years ago

Just for reference, this site lists the valid combinations of arch/os supported by the Go compiler: https://gist.github.com/asukakenji/f15ba7e588ac42795f421b48b8aede63

schnatterer commented 3 years ago

I'd like to contribute some more examples test cases that could affect this issue:

This issue might overlap with #102.

pataquets commented 1 year ago

I'd like to add the case where there are alternate binaries matching your platform, such as: