openculinary / infrastructure

This repository documents the steps required to set up a fresh RecipeRadar environment
GNU Affero General Public License v3.0
5 stars 5 forks source link

Configure corporate mailing lists for OpenCulinary C.I.C. #36

Open jayaddison opened 1 year ago

jayaddison commented 1 year ago

Is your feature request related to a problem? Please describe. Project development of RecipeRadar is managed by OpenCulinary C.I.C.. Project-related communication (announcement, development discussions, automated emails, and so on) are to be provided using public mailing lists, similar to the Debian project mailing lists.

Describe the solution you'd like A set of public mailing lists for development of RecipeRadar, including web archives available for browsing. These should include:

This set of mailing lists is likely to expand and change over time.

jayaddison commented 1 year ago

Suggested domain name for the web view of the mailing lists, and email addresses for the mailing lists:

jayaddison commented 1 year ago

Infrastructure

As a best practice, it usually makes sense to provide separation between infrastructure components that don't have software-level dependencies (the mailing lists are for discussion about RecipeRadar -- but RecipeRadar could exist without the mailing lists, and the mailing lists could exist without RecipeRadar), and so we should design for that.

Currently both HTTP and HTTPS traffic for *.reciperadar.com domains arrive at a single IP address and webserver (haproxy), so it would likely be easiest to extend that webserver configuration to reroute traffic for lists.reciperadar.com. At the moment that is, I think, the only place where infrastructure-sharing for this feature would make sense. It'd be worthwhile to review the haproxy.cfg to consider any edge cases and problems.

Although the RecipeRadar service is containerized, I think the argument around avoiding infrastructure sharing between independent components extends to the configuration of the mailing lists, so I think we should not use containerization for these mailing lists.

I think GNU Mailman probably makes sense as the system to deploy.

Permissions

Edit: updated remove suggestion of using GNU Mailman; currently the Courier MTA is being considered instead.

jayaddison commented 1 year ago

I think GNU Mailman probably makes sense as the system to deploy.

Eh, maybe. It has a lot of dependencies, and it requires two persistence/query engines: one for persistent storage of email content, and one for search. For the former, we would likely use PostgreSQL since we have it in our stack already; for the latter, any backend supported by django-haystack should work.

However: I don't think we would want to use the Haystack OpenSearch backend, because we wouldn't want mailing list activity to affect the performance of the RecipeRadar online service. PostgreSQL would be a good option since it uses different storage I/O paths in our environment and it has mature built-in full-text-search support. However, it is not currently supported as a django-haystack search backend.

I recommend that we look into the public-inbox project instead as a way to host our mailing list infrastructure. It aligns well with our software licensing preferences, provides git cloning support for archiving and mirroring, and looks like it should require minimal infrastructure cost to support. It uses Xapian to provide mail archive search functionality. It does pull in a reasonably large number of transitive perl-based dependencies, but I think that's an acceptable tradeoff.

jayaddison commented 1 year ago

I've created https://github.com/openculinary/mailing-lists ... although am not completely certain that that's the correct approach; it seems that each individual mailing list is stored as a git repository, but I've created a single top-level git repository containing three git repositories that are in turn under source control.

I don't necessarily see a problem with that, but it's likely not the intended architecture and I'm not sure whether there are benefits to having meta-level source control. The downside is mostly that it's weird and potentially confusing, and perhaps slightly wasteful.

jayaddison commented 1 year ago

concerns:

references:

jayaddison commented 1 year ago

I recommend that we look into the public-inbox project instead as a way to host our mailing list infrastructure. It aligns well with our software licensing preferences, provides git cloning support for archiving and mirroring, and looks like it should require minimal infrastructure cost to support. It uses Xapian to provide mail archive search functionality. It does pull in a reasonably large number of transitive perl-based dependencies, but I think that's an acceptable tradeoff.

Further to this: I think that, roughly speaking, I'm suggesting using git repositories written by public-inbox as the primary source of truth for mailing list discussion content. That is: instead of populating mbox / Maildir files, I think we could read mail directly from stdin (from exim/postfix/whatever the MTA is) and then write the results to a git repository that is used to serve HTTP/HTTPS traffic, and is cloneable using standard git tools.

jayaddison commented 1 year ago

Hmm. I was thinking that postfix would be the mail transport agent to deploy here, but I feel somewhat uneasy about that after learning some of the corporate history of IBM in relation to Nazi Germany.

It looks like sendmail has become somewhat more corporate-sponsored recently. It could be worth a look next.

jayaddison commented 1 year ago

Hmm. I was thinking that postfix would be the mail transport agent to deploy here, but I feel somewhat uneasy about that after learning some of the corporate history of IBM in relation to Nazi Germany.

Note: it's not simply the sense of wrongdoing (that seems to ring true given a pattern of sales-heavy, numbers-over-people, revenue-oriented mindsets), but also the fact that the company failed to keep records and also pushed back on attempts to make amends. Perhaps they were secretly working with the allies as well, or something like that - and there was probably an element of that, but to my knowledge so far, that doesn't counterbalance complicity with a terrible regime.

jayaddison commented 1 year ago

It looks like courier-mta could be a good mail transfer agent option to begin with here. It can be configured to redirect incoming mail messages to a command-line process on standard-input, either using a legacy-aliases format (similar to sendmail), or a preferred dot-courier configuration format.

jayaddison commented 1 year ago

I recommend that we look into the public-inbox project instead as a way to host our mailing list infrastructure. It aligns well with our software licensing preferences, provides git cloning support for archiving and mirroring, and looks like it should require minimal infrastructure cost to support. It uses Xapian to provide mail archive search functionality. It does pull in a reasonably large number of transitive perl-based dependencies, but I think that's an acceptable tradeoff.

This continues to be the plan, however there are some additional details that I've been considering over the past few days:

jayaddison commented 1 year ago

Currently both HTTP and HTTPS traffic for *.reciperadar.com domains arrive at a single IP address and webserver (haproxy), so it would likely be easiest to extend that webserver configuration to reroute traffic for lists.reciperadar.com. At the moment that is, I think, the only place where infrastructure-sharing for this feature would make sense. It'd be worthwhile to review the haproxy.cfg to consider any edge cases and problems.

In retrospect, there's one additional place where infrastructure-sharing makes sense here, in the short term: since we have fairly minimal hosting compute resources, I think we can place storage of the mailing list content and git repositories into the existing persistence I/O storage path.

jayaddison commented 1 year ago

:construction: This work is currently paused pending some other infrastructure migration plans (physically moving the server hardware).