poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Notes on how to serve multiple package respositories #254

Closed stschiff closed 1 year ago

stschiff commented 1 year ago

I think I have a pretty concrete idea on how we can teach our server to host several independent package repositories.

(Note that in the following I also use a new naming idea that I had: "Community Repository" for the current published_data, and "Curated repository" for the upcoming Minotaur-packages.

First, trident serve should accept multiple base directories, each with a different name, e.g.

trident serve -d community_repository=~/repos/published_data -d curated_repository=~/repos/ppd -d poseidon_aadr=~/repos/aadr

(and just to keep things very flexible, we could even use the same name multiple times to have multiple base directory belong to the same repository, although that will hardly be necessary).

Then, we just extend all APIs to include the repo name, so instead of /packages we would have /community_repository/packages. And instead of /zip_file/2010_RasmussenNature?package_version=1.1.0 we might have /curated_repository/zip_file/2010_RasmussenNature?package_version=1.1.0 and so on.

So from a Server/API perspective this is really simple to implement.

From the client-perspective, we would have to include a mandatory flag --repository or -r for trident list --remote and trident fetch. Of course, we could implement a hard-coded default (like community_repository or so). And that's that. The User would have to take care that they download stuff into a respective folder structure.

We would then also add a new API /repositories which would list the repositories served under the current server instance.

Note that this implementation then assumes that there is full independence between the repos. All APIs will work entirely independent in each repo. Like /packages would give entirely different results when run on the different repos. I think this is the simplest setup.

nevrome commented 1 year ago

Sounds good! I think we could really do it like that. Just two questions and a comment:

stschiff commented 1 year ago

Yes, points well taken:

nevrome commented 1 year ago

I thought about it for a while and decided now that I like the idea of putting the repo names into the query string. It has some advantages:

  1. It doesn't break the current API. So it would be easier to now release trident 1.2.0.0, which otherwise (e.g. with /<repo_name>/packages) would be deprecated again right away.
  2. It allows to query multiple repos at once, e.g. /packages?client_version=1.2.0.0&repo_name=PBA,PAA (or whatever is the best way to encode lists in URLs :shrug: - maybe &repo_name=PBA&repo_name=PAA?)
  3. This works equally for all 4 current endpoints.

I think the best default would be to query all archives, not just published_data (or how we would like to call it eventually)

stschiff commented 1 year ago

Definitely yes to query-string instead of path-element.

I'm interested to hear your arguments for why you would like to allow multiple repos to be queried, optional or by default. I can see no really good use case here, since they will eventually contain the same data. I think this will only lead to confusion. Also, I think we would like to encourage a workflow where users keep them in separate folder hierarchies instead of mixing them, right? I think conceptually they are really quite different worlds and we should keep them separate.

nevrome commented 1 year ago

Hm - from a --downloadAll perspective you're right, but from the point of view of normal list and fetch calls I think having all available is the best default. /packages, /groups and /individuals should imho show all versions of all entities Poseidon has to offer. zip_file should by default allow to download all packages, without the need to set a repo first. As we planned it now, there will be no package name overlaps between the archives.

nevrome commented 1 year ago

Implemented in #258