Closed samoht closed 11 years ago
We also need a trove to categorise packages by type. So instead of a lexical split of directories, how about organising them by packages/database/sqlite
for the moment? The trove will still need tags as things can have multiple purposes, but it's nice to be able to quickly search by directory in the first instance.
Some way of moving older versions into an archival directory (where they are still available, just not in the 'forefront') might also be a good idea.
Based on my experience, I think that would be a bad choice. Trove categorizations tend to be more volatile than one might imagine at first. And many orthogonal categorization can be imaged. Having to move around stuff when one changes idea of what's the ultimately perfect categorization would be annoying in the long run.
IMHO, it would be much better to have a "pool" of packages, organized as aseptically as possible, for the sole purpose of avoiding having huge directories. Lexical categorization serves that purposes well.
Then, on top of it, we can add all sort of external categorizations we want, as mere indexes that reference the lexical (and stable, and long-term) categorization.
Just my 0.02€.
You make a fair point. Directory categorisation works very well for BSD ports since users do often navigate and install directly from the filesystem structure. OPAM has much more complex constraints due to all the compiler switches, and so a longer-term solution will probably be a dpkg
-style curses frontend that explains its decisions to the user more clearly. This points to a pool of packages also.
In terms of concrete changes to OPAM for the moment though, all we need is for the packages/ directory to support arbitrary nested sub-directories. I guess a sensible heuristic is to traverse every directory until an opam
file is encountered within it, and then stop descending.
Another easy answer: for the archived packages, we should simply create an 'archive' remote that holds the older, rarely used packages. This could be added by default in the beginning. We need multiple remotes for the Platform anyway.
-anil
On 20 Feb 2013, at 09:40, Stefano Zacchiroli notifications@github.com wrote:
Based on my experience, I think that would be a bad choice. Trove categorizations tend to be more volatile than one might imagine at first. And many orthogonal categorization can be imaged. Having to move around stuff when one changes idea of what's the ultimately perfect categorization would be annoying in the long run.
IMHO, it would be much better to have a "pool" of packages, organized as aseptically as possible, for the sole purpose of avoiding having huge directories. Lexical categorization serves that purposes well.
Then, on top of it, we can add all sort of external categorizations we want, as mere indexes that reference the lexical (and stable, and long-term) categorization.
Just my 0.02€.
— Reply to this email directly or view it on GitHub.
I definitively think that making $NAME/$VERSION
is a good idea, especially since some people do like to release often. I also think that it would be better to do this change before 1.0.
Indexing by package first letter, I think it depends a little bit on how opam-repository intends to be managed in the long term. That is, does it strive for completeness (and thus maybe pointlessness) or does it perform some selection based on quality/usefullness ? In the latter case opam is perfectly able to support multiple repositories, repositories are simple to publish so not being in opam's main repository shouldn't be a big problem. Also older packages could eventually be moved to opam-repository-oldies or what not. As such it may be not be really necessary to index by package first letter (I prefer not to have to navigate manually deeply nested file hierarchies, even though that should not happen too often).
I really don't like @avsm suggestion. These kind of hierarchical organisation don't work as soon as two category apply, may not match each ones idea of where one thing should be and thus make it slower to find the actual package when you are looking for it. Better have lists of tags in package descriptions and maintain a separate mechanism to search by tags. Hierarchical categories don't work.
The question is what's a too big directory then ? So that we can find an appropriate tradeoff between easy manual navigation and technical issues.
Finally, a data point may be hackage statistics, ~5000 libraries, now I'm only following that eco-system from far but I doubt that there are ~5000 worthwhile libraries in there.
In terms of concrete changes to OPAM for the moment though, all we need is for the packages/ directory to support arbitrary nested sub-directories. I guess a sensible heuristic is to traverse every directory until an
opam
file is encountered within it, and then stop descending.
Yep, absolutely. I wanted to make exactly the same suggestion :-) This way, the directory structure could evolve independently from the package manager code.
On 20 Feb 2013, at 09:49, Daniel Bünzli notifications@github.com wrote:
I definitively think that making $NAME/$VERSION is a good idea, especially since some people do like to release often. I also think that it would be better to do this change before 1.0.
I'd prefer to keep the directory+version as it is right now for simplicity. See my alternative suggestion for archival via a different remote.
I really don't like @avsm suggestion. These kind of hierarchical organisation don't work as soon as two category apply, may not match each ones idea of where one thing should be and thus make it slower to find the actual package when you are looking for it. Better have lists of tags in package descriptions and maintain a separate mechanism to search by tags. Hierarchical categories don't work.
Actually, the BSD ports have maintained this very successfully over many years for 8000+ ports. They do of course support multiple categorisation, and symlinks can (optionally) make a port appear in multiple directories.
However, for the reasons outlined in my previous reply, I think a pool-style model may be more workable here.
-anil
Just to clarify:
There are 2 issues here:
~/.opam/repo/<name>/package/...
~/.opam/{opam,descr,archives}/...
)For 1. I'm sure we can find an heuristic to make everybody happy (by looking at the opam files) And 2. is very easy to change, and completely transparent to the user.
So the only "hard" technical point here is to come-up with a good heuristic to replace https://github.com/OCamlPro/opam/blob/master/src/core/opamPackage.ml#L164. So it could be quite easy and will not break backward compatibility (so not really necessary to do this before 1.0).
Le mercredi, 20 février 2013 à 10:55, Anil Madhavapeddy a écrit :
I'd prefer to keep the directory+version as it is right now for simplicity. See my alternative suggestion for archival via a different remote.
Yes in fact it's better to keep as it is now, since if everything is traversed as you suggest should be done, nothing prevents us to eventually organise things with $NAME/$NAME-$VERSION.
Hierarchical categories don't work.
Actually, the BSD ports have maintained this very successfully over many years for 8000+ ports. They do of course support multiple categorisation, and symlinks can (optionally) make a port appear in multiple directories. So effectively it's not hierarchical system, which is a good thing.
But really I'm not sure that it is useful to reproduce the tagging system in the file system. One thing is that I'd really like to have that information in a tags:
field in opam
files and not to have to put/symlink my package in different directories to allow to tag it. Having alphabetical order in the repo seems more sensitive to me.
However, for the reasons outlined in my previous reply, I think a pool-style model may be more workable here Not sure I understand what you mean by that. For me a pool of packages is a repo.
Best,
Daniel
@samoth Right, as a package maintainer, I'm mainly talking about "how the repository structure should be organized".
One thing is that I'd really like to have that information in a
tags:
field inopam
files
Done. opam info <package> -f tags
to get the tags back.
I've tried experimenting with this feature request (see the more-flexible-repo-structure
branch in my tree) , but I'm not very convinced by the technical details yet. Basically, if you have a non-standard repository structure, then you have to "scan" the whole tree every time you want to know if a package is present in a repository (instead of just checking for packages/$name.$version/opam
). So I guess I need to come-up with more clever caching strategy or to think about that a little bit more (which means it won't be in 1.0).
How about adding an archive remote to the default opam-init? That would let us migrate packages to archive without them just disappearing from the default opam-repository.
Then in the future, we could solve the problem of the archive repository getting too bit, perhaps via a repo flag that would mark it as needing deep scans.
On 6 Mar 2013, at 16:49, Thomas Gazagnaire notifications@github.com wrote:
I've tried experimenting with this feature request (see the more-flexible-repo-structure branch in my tree) , but I'm not very convinced by the technical details yet. Basically, if you have a non-standard repository structure, then you have to "scan" the whole tree every time you want to know if a package is present in a repository (instead of just checking for package/$name.$version/opam). So I guess I need to come-up with more clever caching strategy or to think about that a little bit more (which means it won't be in 1.0).
— Reply to this email directly or view it on GitHub.
What about checking for packages/$name/$version ? And then listing packages/$name should provide all available versions without needing to split filenames... avoiding any confusion which could arise if and when somebody comes up with a library named foo2.3 which is not version 3 of library foo2 :-)
On Wed, Mar 06, 2013 at 08:49:23AM -0800, Thomas Gazagnaire wrote:
I've tried experimenting with this feature request (see the more-flexible-repo-structure branch in my tree) , but I'm not very convinced by the technical details yet. Basically, if you have a non-standard repository structure, then you have to "scan" the whole tree every time you want to know if a package is present in a repository (instead of just checking for package/ $name.$version/opam). So I guess I need to come-up with more clever caching strategy or to think about that a little bit more (which means it won't be in 1.0).
— Reply to this email directly or view it on GitHub.*
Roberto Di Cosmo
Professeur En delegation a l'INRIA
PPS E-mail: roberto@dicosmo.org
Universite Paris Diderot WWW : http://www.dicosmo.org
Case 7014 Tel : ++33-(0)1-57 27 92 20
5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo
Attachments: MIME accepted, Word deprecated
Office location:
Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France
GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3
How about adding an archive remote to the default opam-init?
In my mind, we will always keep the default repository self-contained (ie. all packages should be installable without any archive repo). And as OPAM supports having packages with no associated repositories (which can happen when people remove a repository, or when a package is removed from the repo), I think we actually don't need to add an archive remote.
What about checking for packages/$name/$version ?
I can indeed try to encode some basic policies (packages/$name.$version
, packages/$name/$version
, packages/$name/$name.$version
) and have a special flag for the full scan. I'll try that today.
In my mind, we will always keep the default repository self-contained
Right, but I'm referring to the older versions of packages that are taking up all the space in the current directory tree. We really don't need all those old cstructs, do we? However, it would be nice to keep their descriptions files around somewhere just in case someone needs a specific version, and a slower archive remote would work for that.
I agree that all the reasonably current versions should be in one repository.
Whichever route we take, it would be good to explicitly have a repository format recorded somewhere within the repo itself, rather than probing heuristics...
On Thu, Mar 07, 2013 at 12:03:17AM -0800, Thomas Gazagnaire wrote:
I've finally managed to find a cheap way to encode this. So now, you can use any level of sub-directories in a repository for both the compilers/
and packages/
directories.
Remark1: $name/$version/opam
is again ambiguous, so the right pattern is packages/XXX/.../YYY/$name.$version/opam
.
Remark2: once this version is released, we need to decide what we do in opam-repository
. Having packages/$name/$name.$version/opam
and compilers/INRIA/4.00.1/4.00.1.{comp,descr}
sounds quite sensible to me. We can also put all the base-*
and conf-*
packages in separate folders.
This is a follow-up on OCamlPro/opam-repository/issues/433: