pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
307 stars 447 forks source link

Consider implementing the phpdotenv usage to handle configuration #7068

Closed henriqueramos closed 2 years ago

henriqueramos commented 3 years ago

Currently our configuration file it is a static file, based at the INI file concept.

This concept works well, but did not allow us to change and mock things easily for testing, or used it with environment variables.

We could use the bullet-proof package vlucas/phpdotenv to handle the .env parsing and setting.

One valid use case for this it is the configuration of OJS dockerized environments.

henriqueramos commented 3 years ago

As mentioned by @marcbria on Slack:

I suggested only take the config.inc.php variable of the env variable does not exist... (from Dockers side this would be great, but I don't know if this could cause any collateral effect).

We could try to implement the phpdotenv at PKP\config\Config::reloadData or at PKP\config\ConfigParser::readConfig.

henriqueramos commented 3 years ago

Also, currently our phpunit tests aren't running in one separated "disposable" environment. Every test uses the configuration data (including database config) from the main config.inc.php file.

NateWr commented 3 years ago

Can you give some code examples showing the limitations you're hitting with the current config.inc.php file?

Vitaliy-1 commented 3 years ago

The one that Laravel uses? It looks to me that PKP\config\Config is the functional analogue of Laravel's Illuminate\Config\Repository class, which we initialize in our PKPContainer and PKP\config\ConfigParser has similarities with Illuminate\Foundation\Bootstrap\LoadConfiguration. Unifying those looks like another move towards the current Laravel integration approach.

jonasraoni commented 3 years ago

One good thing about .ini files is that we can leave comments there =]

I think .ini files are ok, but I personally would store configurations using a JSON format, since it supports hierarchies and arrays out-of-the-box :)

About adding support to extend/overwrite configurations using environment variables:

In general, I don't have anything against replacing local code by external libraries, given it will have free community/support/documentation/tests.

NateWr commented 3 years ago

I would be inclined to draw on Laravel's configuration pattern, in part because it aligns more of our code base with the same underlying framework. They do support environment variables as well, and we can write PHP code into the config area if we really needed to.

Am I right in thinking the main issue with the current approach is that you'd liike to be able to define some environment variables on the command line, for Docker or PHPUnit?

My one hesitation with going fully with phpdotenv is that everything is an environment variable. Should every configuration variable we use be an environment variable?

marcbria commented 3 years ago

In the docker channel in slack we had a nice conversation a month ago about config.inc.php with Andrew and @asmecher

Summarizing, we said:

There are 4 things other well-known projects are doing to have a better configuration files that I think we can learn about:

  1. Config as environment vars: Andrew pointed Drupal, but Nextcloud (see "Auto configuration via environment variables") goes even further and let you set more variables than the ones related with the DB.
  2. Config folder (with multiple configs): Drupal and nextcloud (between many) keep a folder in app web-root with all config stuf (in Drupal is called "/var/www/html/sites" and in NC is "/var/www/html/config"). It let you (for instance) keep different configs based on domain-names or profiles (production/dev) environments, etc.... and it also fits better with docker (you can easily create a volume of this config folder to keep config separate from your non-changing code).
  3. Move out from the old INI syntax: Most of those projects don't use an INI files any more, they keep all config in an array (nextcloud) or in some arrays (Drupal).
  4. Move configuration to the DB: Nothing to say here, as far as PKP is going in this direction too. I found a nice post that explains all those approaches (and a few more) better than me: https://stackoverflow.com/a/45086900

phpdotenv project covers 1, 2 and 3 and in their site they explain why .env vars are a good idea: https://github.com/vlucas/phpdotenv

Andrew (I need to find his github user) extended this saying: 1 is critical for this to be an image that is appropriately established for use in anything other than demo and local environments. If you cannot configure with environment variables, you are then required to maintain stateful data for config, and public, and private data… and config … simply put… it’s in the wrong place as stateful data. Using environment variables for configuration allows for dynamic creation and tear-down of preview, testing, and development environments. This is the reason for “Anything that is likely to change between deployment environments [should be an environment variable]” that is tenant #3 of https://www.12factor.net/ … store config in the environment.

2 is a step up from 3 and 4. Setting your configuration in your database doesn’t make your code a separate entity from your data as it causes the configuration to bleed into the data. This is one thing that the ini file approach has going right for it. I really don’t think that putting the configuration into the database is a good plan (you’ll then, as the SO answer mentions, have two sets of configuration too… some in the database, some … wherever the database credentials are stored).

One good thing about .ini files is that we can leave comments there =] Well... if we move to .env files, you can also comment and you get the plus of don't need to be as worried to share credentials with your git repo.

I see huge benefits in testing, docker and cloud environment but...

Even the potential you release using environment variables is really big... (you can also change variables that are not from OJS) my lack of knowledge about OJS code makes me feel insecure about the side-effects.

Should every configuration variable we use be an environment variable?

I got the same question, but Andrew made his point. Not everything. Only the ones in config.inc.php.

NateWr commented 3 years ago

Thanks for bringing all that in @marcbria, that's really helpful!

I really don’t think that putting the configuration into the database is a good plan (you’ll then, as the SO answer mentions, have two sets of configuration too… some in the database, some … wherever the database credentials are stored).

I agree with this. WordPress stores a lot of settings in the database and this makes it really hard to deploy staging/production environments.

Even the potential you release using environment variables is really big... (you can also change variables that are not from OJS) my lack of knowledge about OJS code makes me feel insecure about the side-effects.

Yeah, I'm curious about this side of it. It may be that some shared hosts prevent or limit the ability to modify the environment. It would be interesting to explore what impact this might have.

Laravel's configuration pattern does support choosing whether to take a config option from the environment. And has built-in support for defining things differently based on common deployments (locale, staging, production).

marcbria commented 3 years ago

WordPress stores a lot of settings in the database and this makes it really hard to deploy staging/production environments.

Same happened in Drupal 10 years ago, and they moved out from DB to config files again. :-)

Laravel's configuration pattern does support choosing whether to take a config option from the environment.

Nate, if I understand well, you like Henrique's proposal to integrate phpdotenv but you propose to do it in the Laravel's way? If yes, I also think is a good idea to keep OJS tight to Laravel's patterns and best practices.

The only BUT is we are concerned about potential security issues and brand new conflicts... but at same time this is becoming the new standard so we are not the first ones to walk this path.

Some articles talking about those same security concerns:

Guys, thanks you all for your work.

henriqueramos commented 3 years ago

Thanks for bringing all that in @marcbria, that's really helpful!

I really don’t think that putting the configuration into the database is a good plan (you’ll then, as the SO answer mentions, have two sets of configuration too… some in the database, some … wherever the database credentials are stored).

I agree with this. WordPress stores a lot of settings in the database and this makes it really hard to deploy staging/production environments.

Even the potential you release using environment variables is really big... (you can also change variables that are not from OJS) my lack of knowledge about OJS code makes me feel insecure about the side-effects.

Yeah, I'm curious about this side of it. It may be that some shared hosts prevent or limit the ability to modify the environment. It would be interesting to explore what impact this might have.

Laravel's configuration pattern does support choosing whether to take a config option from the environment. And has built-in support for defining things differently based on common deployments (locale, staging, production).

Using the env function it's great, but there's one small issue with it, he got the data from $_SERVER, not from environment variables (using the getenv native method), this changed at Laravel 5.8. There's even a polyfill for it.

In one of my previous companies, we've developed a support class to handle data from getenv, instead to rely on env method or $_SERVER stuff.

edit

And I don't even want to mention the code smell related to the env guts.

NateWr commented 3 years ago

It looks like Laravel 8 uses phpdotenv and also now reads from the $_ENV super global. Maybe there's not a distinction to be made, now, and we can experiment with using phpdotenv but configuring it similar to Laravel?

jonasraoni commented 2 years ago

Another option to consider, which would keep the INI style, but would at least allows us to forward the parser code to a 3rd-party is the TOML format.

marcbria commented 2 years ago

Not sure about this Jonas.

It's true that TOML (or YAML) are becoming the new config-file standards, but OJS admins are familiar with ini syntax and the config upgrade will be easier.

I'm afraid a change like this could generate an avalanche of requests in the forum.

jrichardsz commented 2 years ago

If you keep both flavors?

Inspired on spring java framework, your config could be like this:

If the system admin are from old school, they will happy to connect with ssh and hardcode the values:

[database]
driver = mysql
host = 10.20.30.40
username = root
password = Idon'tmindhidingthisvalue
port = 3306

But if the sysadmin is a generation z devops engineer and wants to deploy in several nodes/environments the same ojs with a puff using docker or some crazy technology

[database]
driver = mysql
host = "${OJS_DATABASE_HOST}"
username = ${OJS_DATABASE_USER}
password = ${OJS_DATABASE_PASSWORD}
port = ${OJS_DATABASE_PORT}

Then at the moment of read the ini file: https://github.com/pkp/pkp-lib/blob/a2db11e6ab4779b6b3b97b9a1a9006ed00e01971/classes/config/Config.inc.php#L71

or in the parser https://github.com/pkp/pkp-lib/blob/a2db11e6ab4779b6b3b97b9a1a9006ed00e01971/classes/config/ConfigParser.inc.php

the algorithm should be like this:

if value has ENV format : \\$\\{([^}\\s]*)\\} 
  get value from env with https://www.php.net/manual/en/function.getenv.php;
else
  read from static ini value

I implemented that algorithm on java and nodejs. Some php developer should be able to do it.

Finally if you add this feature, would be fulfilling the third commandment of 12factor https://12factor.net/config and the use of docker and modern technologies will be easy.

NateWr commented 2 years ago

A comment from @ctgraham on the benefits of moving (some) config settings to the database:

Short answer on storing configuration in the database rather than config.inc.php: anything configured in config.inc.php requires filesystem and thus PHP access and could/should be restricted system administrators. Generally, we should make anything which rationally can be managed by the user available to be managed by the user. That is: a Journal Manager or Site Administrator in the software should be able to configure settings self-service, not depend on a System Administrator who has filesystem access.

Examples of settings which logically live in a System Admin's responsibility (config.inc.php is OK!): database settings, external system commands.

Pretty much everything else should logically live in a Journal Manager or Site Admin responsibility (get these out of config.inc.php!). Yes, this adds challenges in automating deployments, but we need to do right by the user first.

NateWr commented 2 years ago

My preferred approach is to use a layered system that allows a sys admin to override user-configurable settings. This would allow a sys admin to "lock down" settings when they want and open them up when they want. Example:

$timezone = Config::get('time_zone', 'UTC');

This would return the first value found in the following order:

  1. The environment variables.
  2. The configuration file (or dot env, etc).
  3. The site settings.
  4. The context settings.
  5. The default fallback when one is provided (UTC in this example)
NateWr commented 2 years ago

@ctgraham I'd also be interested to know which of the config values you think ought to be site or context settings. Moving these into settings would probably not be too difficult in most cases, and our config file is long overdue for a clean-up.