joomla-framework / console

Joomla Framework Console Package
GNU General Public License v2.0
9 stars 4 forks source link

Regarding adding --live-site to this package #15

Closed nikosdion closed 7 months ago

nikosdion commented 2 years ago

For reference please see the CMS repo issue https://github.com/joomla/joomla-cms/issues/38518 and tagging @nibra so we can carry the discussion here.

The issue in the CMS repo had to do with CMS–specific classes always expecting to run in a web context. For example, the CMS' MVCFactory service always creates MVCFactory factory objects after pushing a SiteRouter service to them. Creating the SiteRouter service requires instantiating the SiteApplication object which which assumes it's running in a web context, therefore expects $_SERVER['HTTP_HOST'] to contain an actual domain name. However, under CLI this contains the path the PHP script we passed to the PHP executable which cannot be parsed as a domain name, causing an exception to be apt out by the Uri framework object.

Question 1: Since the framework is isolated from the CMS, wouldn't it stand to reason that a developer extending the framework Console package would be aware of the CLI–only execution context and not engage in this kind of tomfoolery?

The solution in the CMS repo is to pass a --live-site option which contains the URL to the site the CLI application corresponds to. If this is not available, it will fall back to the application's live_site configuration key. If that is also empty / undefined it will fall back to a fake domain name.

Question 2: Since the framework console application does not have a configuration registry this would be impossible to do. Wouldn't that mean that we would always fall back to the fake domain name, necessitating overriding the method which handles the --live-site option in the CMS?

Question 3: Assuming questions 1 and 2 are answered in the affirmative, would that not mean that the point of having a --live-site option in the framework console application is rather moot?

I mean, sure, I can transfer (parts of) the code to the framework, but does it make sense / does the code belong in the framework? That was why I was sceptical yesterday and why @HLeithner told me I should modify the CMS console application instead.

nibra commented 2 years ago

In principle, a console application should not (have to) know anything about the web environment. However, I can imagine use cases where the URL is definitely of interest - e.g. cron jobs that send notifications with backlinks. Therefore, we should find a solution how to pass such optional parameters cleanly if needed.

nikosdion commented 2 years ago

I understand that, but does it belong to the framework?

The framework can be used outside the scope of the Joomla CMS. If I make a custom CLI application which needs a base URI to create backlinks I would very reasonably implement an option, either in my commands or my application, to retrieve this URL from the user and use it the way I need to.

The way the CMS solution, the populateHttpHost method, works is very dirty. It overrides $_SERVER variables. It does not return anything cleanly.

The way I'd prefer it to work is different.

The getDefaultInputDefinition method in the framework console application currently uses a hardcoded array. If I want to extend the default common options I need to override the method and include everything it already includes. Why not make this array a private property and add a getter and setter to it to customise it more easily?

I would imagine that a custom implementation of the console application could add an application–level option to get a base URL and use it to configure whichever URL routing solution they are using. In an ideal world it would be used to register a new service provider in the container used throughout the application and its commands, meaning they'd have to override how command objects are instantiated in addCommand to pass the DI container / service locator. This becomes too far-reaching to prescribe in the default framework implementation of the console application.

nibra commented 2 years ago

I tend to think that the fundamental error is that SiteApplication is accessing $_SERVER at all. Shouldn't access to global variables be reserved for the front controller?

HLeithner commented 2 years ago

The $_SERVER variable is a fundamental part of php and most applications uses it, even if you build your complete own application which has console and web you will share code that uses at some point the superglobal. For our DX I would add it with a config parameter into the console application if possible.

pseudo example:

// in the cms console application
$config->set('live_site', JConfig->get('live_site));
app::__construct(x,y, $config)

// in the framework console application

function __construct($x, $y, $config) {
  if ($config->get('live_site')) {
    $this->fixHostname($config->get('live_site'));
  } else {
    // if in command line
    $this->fixHostname(getfromcommandline('--live-site'));
}
}

is a bit simple but I think you understand what I mean. Only been super correct doesn't make our live easier.

nikosdion commented 2 years ago

No, not really.

You need to know the URL of the site very early in the application's lifecycle to do SEF routing. When you are being access as https://www.example.com/foo/bar/baz what is the actual route you need to parse? Is it /foo/bar/baz, /bar/baz, /baz, or an empty path? You cannot infer it from the filesystem as it's perfectly possible to server http://www.example.com/foo/bar from the directory /opt/sites/mackerel/tomato.

You also need the actual domain name for other reasons, including a security feature we have not (yet) implemented in WebApplication: filtering access only to allowed Host headers. This is important in self-hosted servers because Apache's default is to accept any random Host header and serve it with the default virtual host!

Moreover, when Joomla 5 (or beyond...) implements multisite, how exactly would you know which sub-site's configuration set to use if not by inspecting the Host header?

$_SERVER is environment, it's got the same architectural place as $_ENV.

If we'd like to split architectural hair I would say that what the CMS is missing is two services: an EnvironmentRegistry and an ApplicationFactory.

An EnvironmentRegistry would be created very early by the entry point files (index.php, administrator/index.php, api/index.php and cli/joomla.php) and populated with the environment variables and any overrides from a .env file in JPATH_ROOT. THIS would be the ONLY point of truth for constructing Input objects with the keys server and env (in fact the latter should be an alias of the former given the way PHP works). IMHO that would also make everything more testable on the basis that you can now completely isolate the tested code from its execution environment.

An ApplicationFactory would create the concrete object of the CMS applications: site, administrator, api, cli, and installation. I would also say that this factory should actually be a singleton service, returning the same object every time we call it with the same key (the key would of course be the application class).

Maybe it's just me, but I'd actually store the execution environment type (cli or web) in the DI container to let the ApplicationFactory instantiate the WebApplication descendants and the console Application descendants in an appropriate manner if the environment is the opposite of what they are meant to run in. For example, instantiating a WebApplication descendant in a CLI context would tell ApplicationFactory to look for an X_JOOMLA_LIVE_SITE key in the EnvironmentRegistry service which will be used to set the live_site key in the configuration service of the service locator / DIC we are passing to the application when creating it. This way the console application only really has to set one key in a predefined service and everything is beautifully isolated and testable. No magic global bullshit.

Maybe I am thinking too grand. I honestly have not pursued that path in any meaningful way beyond the thought experiment I shared. I think we can implement it without breaking b/c hard in the CMS.

If you are interested, we could probably set up a meeting to discuss my crazy ideas.

nibra commented 2 years ago

I really like your approach!

If you are interested, we could probably set up a meeting to discuss my crazy ideas.

Yes, let's schedule something for the beginning of next week; maybe Benjamin, Harald and Allon might be interested to join.

nikosdion commented 2 years ago

I am unavailable on Monday because Crystal's got meetings and the Internet in the summer house we are now can't support both of us doing online meetings. How about Tuesday? I am generally available after noon UTC. Pinging @bembelimen @HLeithner and @laoneo to see if you guys are interested / available.

laoneo commented 2 years ago

An EnvironmentRegistry would then also help to revert https://github.com/joomla/joomla-cms/blob/4.2-dev/libraries/src/Service/Provider/Config.php. Not sure about the application factory yet as we have a service provider which delivers always the same app when requested.

Meeting sounds great!

nibra commented 2 years ago

Tuesday sounds good, but we should be ready by 17:00 UTC as we have a production meeting then.

nikosdion commented 2 years ago

@laoneo That service would not be removed, that's the application configuration. An EnvironmentService would read environment settings which could be used by the ApplicationFactory to override (or enhance) the config service's settings.

Very realistically speaking, Joomla 5 could have the application pre-initialisation code check the EnvironmentService for environment variables named something like (I am making this up, not necessarily how we'd do it!) JOOMLA_DBNAME to override the dbname key in the config service. Since the config service is a Registry object we can easily get its keys and go through the EnvironmentService.

Here's the cool bit. The EnvironmentService would not only read the PHP superglobals $_SERVER and $_ENV but would also read .env files using DotEnv.

How does that help? Oh, that's easy! I can have the exact same configuration.php file and two different .env files, one for my local development and one for my live site.

How would that help us going towards Joomla 6 and later? Well, the configuration.php file could become optional in 6, then deprecated in 7, then removed altogether in 8. Or something like that. Obviously we'd need to have an official way to write to the Global Configuration instead of letting every 3PD (like me!) having to reinvent the wheel and write directly to configuration.php with reckless abandon.

Having an official interface to managing the configuration modifications would also mean that we can have events to run before AND after a configuration change which opens a host of possibilities. From being able to notify Super Users on Global Configuration changes (like I do in Admin Tools) to giving site owners an option to “lock down” the Global Configuration against modification from the web interface (meaning taking over a Super User account is no longer such a massive threat as it was before), and from performing sanity checks (e.g. does the DB connection info still result in a working DB connection) to auto-publishing any plugins necessary for a Global Configuration option to be meaningful (which MIGHT become relevant as Joomla is moving towards a middleware future, as we had discussed about 7 years ago in Denmark).

As I said, I have ideas. A bit crazy considering where Joomla's architecture currently is but not unrealistic. As the old French aphorism goes “si c’est possible, c’est fait; si c’est impossible, cela se fera” (“if it's possible, it's done; if it's impossible, it shall be done”).

@nibra So let's set a meeting at around 1pm UTC? We can keep it relatively short, between 1 and 2 hours. I'm sure we are not going to solve the world's problems or even Joomla's architectural problems in one meeting but we can at least kick around some ideas and discuss how they can help improve the architecture and how they can be implemented gradually over time without breaking b/c in a way which would screw all of us: core maintainers, 3PDs, integrators and end users. That is the hard part.

nibra commented 2 years ago

13:00 UTC is perfect. Seeing forward to it!

bembelimen commented 2 years ago

Sorry I'm super busy this week and will not make it, but thanks for the invite, looking forward to the results.

nikosdion commented 2 years ago

Hello, everybody!

Making sure we're all on the same page, I have created a time and date link with the meeting time. We will host it over Skype because that's something that runs on all of our's computers.

Please ping me on Skype, my handle is live:sledge81. Otherwise I don't know how to add you to the meeting when the time comes later today :)