openzim / overview

:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
2 stars 1 forks source link

Define a convention on TCP/UDP ports used by development stacks #24

Open benoit74 opened 10 months ago

benoit74 commented 10 months ago

Background

In many of our projects, we deploy local development servers (e.g., an API and a Database) on our development machines for testing purposes. These servers expose a TCP (occasionally UDP) port on our local machine. Currently, there is no standardized convention for the usage of these TCP/UDP ports across projects. For instance, some projects use port 8000 for web APIs, while others use 8080.

Note: This intentionally simplifies the distinction between TCP and UDP ports and assumes we don't want two distinct services, one on TCP and one on UDP, running on the same port number. Although technically possible, it's deemed cumbersome for our purposes.

Problem Statement

The absence of a convention on TCP/UDP port assignments for local development services leads to two issues:

Proposition

We can address the problem by establishing a convention for TCP/UDP port assignments.

The proposed convention is to use port XXXY for every server in our systems, where:

Feedback and implementation

All feedbacks are welcomed, after that I will transition this to a Wiki entry.

rgaudin commented 10 months ago

LGTM ; I suggest we blindly assign an XXX for every repo in our 3 orgs in chronological order and put that in the Wiki. Some repos could be assigned 2 or 3 slots right away (offspot/container-images for instance)

mgautierfr commented 10 months ago

I would go even further and assign an XX for every repo. It will reserve 100 slots (XX00 to XX99) to every repo which should be greatly enough. Even if we use the unit for the type of the service, we still have 10 slots for this specific service of this specific project.

rgaudin commented 10 months ago

I would go even further and assign an XX for every repo.

I support this

benoit74 commented 10 months ago

I would go even further and assign an XX for every repo.

I like the idea for it's "simplicity" (no need to reassign ranges should more ports be needed, easier to remember two digits per repo than three) but I'm a bit worried by the fact that this will consume a lot of space / increase the chance of collision with another service.

Or should we jump directly to 3XXYY where there is way less risk of collision according to https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers and the "fixed" 3 is easy to remember?

rgaudin commented 10 months ago

I don't mind. For instance my router (and @kelson42's but he doesn't know yet) makes regular requests on WLAN on port 8080. This creates annoying requests in dev projects. We can go above 30K AFAIC.

benoit74 commented 10 months ago

We can go above 30K AFAIC.

I think so as well. They are used as ephemeral ports but I don't think we should mind, the system won't reallocate a port used by a service as ephemeral one, and the chance that a port we need is used as ephemeral port at the precise moment where we need it are quite low (and should a conflict occur, chances are high we can just retry and it will work because the TCP connection has been dropped). At least we can try, and should it causes issues, we will easily change to another strategy.

benoit74 commented 10 months ago

Could you please help me fill the dots in table below? And check what I've already "decided".

I sorted the list alphabetically for convenience, and I propose we assign port ranges in alphabetical order for now. And then in chronological order for new projects. Except if it is easy for you to sort the list in chronological order, but it is not for me ^^

I have chosen to not assign a port to new Vue.JS-based scrapers like freecodecamp and kolibri (soon), because even if we sometimes start a yarn dev, it is the single process we start and not a whole stack, so I do not think we should mind about these.

Organization Repository Port range needed
kiwix .github ?
kiwix apple N
kiwix borg-backup N
kiwix container-images ?
kiwix ipfs-portal ?
kiwix java-libkiwix N
kiwix k8s N
kiwix kiwix-android N
kiwix kiwix-android-custom N
kiwix kiwix-android-nightly ?
kiwix kiwix-build N
kiwix kiwix-desktop N
kiwix kiwix-js N
kiwix kiwix-js-pwa N
kiwix kiwix-tools ?
kiwix libkiwix ?
kiwix metrics ?
kiwix mirrorbrain Y
kiwix overview N
kiwix web ?
offspot base-image ?
offspot captive-portal ?
offspot cardshop Y
offspot container-images Y
offspot content-filter ?
offspot dashboard ?
offspot docker-export N
offspot edupi Y
offspot image-creator ?
offspot imager-desktop ?
offspot kiwix-hotspot ?
offspot kiwix-plug ?
offspot mediawiki-docker ?
offspot metrics Y
offspot offspot-config N
offspot operations ?
offspot package-requests N
offspot testbench ?
offspot wikifundi ?
openzim _python-bootstrap Y
openzim cms Y
openzim docker-publish-action N
openzim dwds N
openzim freecodecamp N
openzim gutenberg N
openzim ifixit N
openzim javascript-libzim ?
openzim kolibri N
openzim librechef N
openzim libzim ?
openzim lilote N
openzim mwoffliner N
openzim nautilus N
openzim nautilus-webui Y
openzim node-libzim N
openzim openedx N
openzim overview N
openzim phet N
openzim python-libzim ?
openzim python-scraperlib N
openzim python-storagelib N
openzim sotoki N
openzim ted N
openzim warc2zim N
openzim wikihow N
openzim wombat N
openzim wp1 Y
openzim wp1_selection_tools N
openzim youtube N
openzim zim-requests N
openzim zim-testing-suite N
openzim zim-tools ?
openzim zimfarm Y
openzim zimit N
openzim zimit-frontend Y
rgaudin commented 10 months ago

Unless there is a reason not to, I reiterate my advise to blindly assign a range to all repos. Many of them won't need them but nobody wants to go through all that list and wonder if it should or if it may need it in the future.

As per sorting, creation date (as well as sequential ID) is available via the API

❯ curl -s https://api.github.com/repos/kiwix/apple | jq '.created_at, .id'
"2015-08-12T19:05:29Z"
40619002
benoit74 commented 10 months ago

Sorry, I slipped through this suggestion. Make sense to me.

benoit74 commented 10 months ago

Please review https://github.com/openzim/overview/wiki/TCP-UDP-ports-for-development

rgaudin commented 10 months ago

LGTM 👍

kelson42 commented 10 months ago

In many of our projects, we deploy local development servers (e.g., an API and a Database) on our development machines for testing purposes. These servers expose a TCP (occasionally UDP) port on our local machine. Currently, there is no standardized convention for the usage of these TCP/UDP ports across projects. For instance, some projects use port 8000 for web APIs, while others use 8080.

The key point here is the principle of less astonishement and then coherency.

I remark that the question of sockets is not treated here, although it should be preferred - in production - for all internal services IMO.

Problem Statement

The absence of a convention on TCP/UDP port assignments for local development services leads to two issues:

* After starting a local development stack, it's unclear where the services are listening, causing delays when switching between projects.

This is a general problem if you work on many projects at the same time, this is far broader than a kiwix/openzim problem!

  * This becomes more pronounced with the shift to docker-compose-based local dev stacks, initiated with a simple `docker compose up -d`.

* Running two local development stacks simultaneously is usually impossible due to port conflicts.

  * This often occurs when transitioning from developing project A to reviewing project B.

I'm not in favour of this kind of exotic approach where usual ports are not used, things should be simple and usual ports should be used. There must be an other solution like:

rgaudin commented 10 months ago

I remark that the question of sockets is not treated here, although it should be preferred - in production - for all internal services IMO.

This is a general problem if you work on many projects at the same time, this is far broader than a kiwix/openzim problem!

I know you know the difference between theory and practice already 😉 And you have your share of responsibility in this I believe.

I'm not in favour of this kind of exotic approach where usual ports are not used, things should be simple and usual ports should be used

Why do you care exactly? it's just a convention that has no impact on software. On a projects without a dev compose, it's a guideline, on those with such a dev compose, it's usage of ports X, Y and Z instead of A, B and C 🤷‍♂️

benoit74 commented 10 months ago

I can only second what Renaud said.

I would just add that:

Or maybe I just don't get what you mean by these principles.

benoit74 commented 6 months ago

@kelson42 Any chance we can move this forward?

Again, this has only to do with development stacks, where:

AFAIK, 3 out of 4 core developers are still happy and in-demand with a concrete proposition synthesized few comments before, and three months later I do not see a concrete proposition of one alternative, but only vague concepts not rooted with the developers reality.

kelson42 commented 6 months ago

We had a discussion at that time with @rgaudin and the conclusion was that we should look first why container engine network capabilities don't allow easily to isolate systems so they can run in parallel without having all the services conflicting with each other. @rgaudin Do I remember properly?

rgaudin commented 6 months ago

@rgaudin Do I remember properly?

Yes and I reported in one of the weekly that it wasn't conclusive. I didn't reply here because I didn't gather evidence and it was too much of an effort to do so: advanced network usage are complicated and poorly documented but it what's sure is that it wouldn't be effortless so even if there's a possibility it would defy the purpose of simplicity/transparency/expectability expressed it.

benoit74 commented 6 months ago

why container engine network capabilities don't allow easily to isolate systems so they can run in parallel without having all the services conflicting with each other

By default (and this is the case in our configurations), docker compose creates one network per compose stack, and they run nicely and in isolation without issue.

The problem is that we want to expose some IP/ports on these networks on the local developer machine. And here, on the local developer machine, we have conflicts of IP/port.

One solution would be to use DNS names and reverse proxy and other stuff to expose one single "thing" and use the DNS name to redirect appropriately, but I'm not even sure it could work for all protocols and is even more complex/expectable since developer would probably have to tweak its DNS configuration for it to work.