php-pds / skeleton

Standard PHP package skeleton.
Creative Commons Attribution Share Alike 4.0 International
2.3k stars 167 forks source link

Resource vs "other files" and run-time generated artifacts #32

Closed mindplay-dk closed 7 years ago

mindplay-dk commented 7 years ago

I took a closer look at this standard and found one aspect of it to be lacking.

I read the discussion about resources in #12 - I understand that it's common to have a folder named "resources", but I don't feel like the definition of it is meaningful.

The term "resource" at the moment is is very broadly defined as:

If the package provides a root-level directory for other resource files, it MUST be named resources/

This is so broad it's almost conflicting with the description of "other directories":

The package MAY contain other root-level directories for purposes not described by this publication.

All I can surmise from these two descriptions is that "other files" may go either into "resources" or into "other directories".

In other words, the description appears to indicate that I gather is probably true, that even if many projects have a folder named resources, the use of this folder is completely arbitrary - just the same as use of "other directories".

So it's kind of already covered by "other directories", and there is no clear definition of what (if anything) should go into "resources", I think because the term "resource" is essentially every bit as broad as the term "file" - since the whole point of this standard is to help figure out how to structure the project's files/resources/assets/whatever, it seems more confusing than helpful to reserve a specific folder for "stuff", and at the same time have a clause that allows for any other root folder-name for "other stuff".

Which brings me to the thing I didn't find as part of this specification.

In our own internal folder-names specification for our projects, we have three reserved folders for "other files", but with a very important distinction, not regarding the type of files/resources that get put into those folders, but with a designated definition of the nature or origins of those "other files".

In particular, I find that this proposed standard defines nothing with regards to dynamic files, so I will try to address that with the following suggestion.

Not minding the folder-names we happen to use, and with attention to the folder-names and definitions already defined by this standard, I'd suggest the following:

  1. Change the definition of resources, such that this folder is reserved for permanent, static files - this could be scripts or other data-files under source-control, containing data-dependencies, which are loaded/parsed or otherwise required by the project.

  2. Add a reserved folder for run-time files, e.g. named runtime - this would be the designated root-folder for run-time generated artifacts, typically cache-files for computed data, images, log-files, temporary uploaded files, etc. This folder must be writable by the application, and all files should be considered volatile or transient, meaning you can erase the contents of this folder at any time without affecting the functionality of the application. (This is different from a system-defined "temp" folder, since e.g. log-files etc. could be important for diagnostic purposes, and since e.g. clearing the cache of a specific application may briefly affect the performance of that application.)

  3. Add a reserved folder for permanent assets, perhaps named assets (though this could be somewhat ambiguous with "web assets" such as CSS/JS files, so I'm definitely open to other ideas) - this would be the designated default root-folder for user-generated, permanent assets, such as uploaded images or other documents, and any other non-volatile data; resources that are critical to the functioning of the application and must not be erased.

I'm merely spit-balling as far as the actual folder names here, just for the sake of discussion.

I believe (2) and (3) are important because of things like:

I haven't looked at existing applications or frameworks to see how these folders are named, but I believe it's a fact that most (if not all) web-applications (at least) have designated folders for permanent and run-time data, as most applications do have both some kind of cache-folder and a folder for user-generated files.

I'm guessing it would take more than a statistical survey to propose specific folder-names for (2) and (3) but at this stage merely wanted to explain why I think these two folder-designations are important - even if there is no statistical common average for the naming of these folders, in my experience, most projects have them, and I think it would be useful to define these.

I'm hoping the purpose of this specification is to establish more than a formalization of an existing de-facto standard for project structure, and so I feel that this may be an extremely important missing piece of the puzzle.

Let me know what you think?

pmjones commented 7 years ago

I'm hoping the purpose of this specification is to establish more than a formalization of an existing de-facto standard for project structure

This specification establishes a formalization of an existing de-facto standard for project structure.

mindplay-dk commented 7 years ago

Hmm, yeah, okay, so you're only interested in formalizing the things that most projects do agree on?

My point was that there's clearly something most projects and frameworks have in common that isn't going to surface by merely comparing folder names - because you're only comparing the names, so your analysis doesn't actually take into account what the folders are used for.

I think that most projects and frameworks have folder designations that they they do have in common, but they all name them differently, and these won't show up by looking at folder-names.

Like for example, Yii has a runtime folder for run-time generated files like cache and log files, in Symfony this is named var, in Zend it's data, in Laravel it's storage, and so on - in OpenCart it's system/storage, in Drupal it's files, in Magento it's var, and those were just a few samples.

Point is, almost any project/framework is going to have these folder designations, but you won't notice this by looking at a statistic.

And in many frameworks like Cake, CodeIngiter or Slim there are no designated folders, it's all up to the developer, so they put these things wherever they feel like it. And some projects are completely unstructured, they'll have cache and log and upload folders in the root, for example, and figuring out the permissions and ownership etc. becomes a hassle.

I don't imagine most projects will change what they're doing based on a standard like this? But I do think it would be helpful to have these designations for new projects, so they come out more uniform, easier to understand and install, etc.

Well, if you're not interested, just close this I guess ;-)

pmjones commented 7 years ago

Yii has a runtime folder for run-time generated files like cache and log files, in Symfony this is named var, in Zend it's data, in Laravel it's storage, and so on - in OpenCart it's system/storage, in Drupal it's files, in Magento it's var, and those were just a few samples.

Since we have the data on root-level folder names, it should be relatively straightforward to count the number of times those names (in collective) show up. For example, var/ appears 379 times across 110k packages: https://github.com/php-pds/skeleton_research/blob/1.x/results/addendum-dirs.txt#L134

So, if you want to put a count together, even an "optimistic" one that assumes your noted folder names above indicates the intent of a "runtime" folder, I'd be interested to see the results.

mindplay-dk commented 7 years ago

So, if you want to put a count together, even an "optimistic" one that assumes your noted folder names above indicates the intent of a "runtime" folder, I'd be interested to see the results.

Even that might not indicate or prove anything - a lot of projects don't have these folders check into source-control at all; they get manually created and configured by the user at installation-time, or get created/configured by some kind of "deployment" tool or installation script the user needs to run.

It's common enough, in production scenarios, to configure projects to store (potentially large) uploaded files on a mounted external file-system, usually in a folder outside the public web-root, to prevent direct downloads, and to scale to large volumes of data - as well as to simplify upgrades, e.g. being able to swap out the entire project folder during deployments, without affecting run-time or permanent storage.

If we're looking only at folders that do exist in the project (e.g. are checked into source-control) then any analysis is going to be a "best guess" as far as the intended purpose folders with those names, so it might be something like this...

Likely temporary storage folders:

storage/    650 0.59%
var/    379 0.344%
data/   1066    0.967%
asset/  52  0.047%
assets/ 2957    2.683%

As for asset and assets, who knows - in projects I've worked on, that's been a common enough name for run-time assets. Depends on where the folder is located - if it's in the public web-root, it's more likely static (JS, CSS) assets.

Temporary storage folders with a designated purpose:

temp/   117 0.106%
runtime/    196 0.178%
cache/  282 0.256%
log/    207 0.188%
logs/   206 0.187%

How you count those depends on whether you think projects should designate root-folders for permanent and run-time storage - a lot of projects don't, they designate several specific folders (cache, images, upload, logs, etc.) rather than grouping them under a parent folder.

We also don't know from the folder names if certain folders are permanent or temporary in the first place:

images/ 826 0.749%
img/    260 0.236%
media/  106 0.096%
imgs/   9   0.008%
upload/ 16  0.015%
uploads/    73  0.066%

And finally, names like resources or files literally don't indicate anything at all - they're just synonyms for the same thing, they could contain anything, whether temporary or permanent storage or static run-time dependencies:

image/  23  0.021%
files/  106 0.096%
Resources/  9207    8.354%
resources/  2274    2.063%
resource/   50  0.045%
Resource/   27  0.024%

I include image here, because that could be for cached images versions, for permanent image storage, or even for both - it completely depends on the project.

I think this problem requires reasoning more than analysis of artifacts like filenames. It would make more sense to look at descriptions of projects and frameworks than looking at file-system footprints. The descriptions of almost any project or framework will include some kind of folder designations either for specific or non-specific types of either permanent or run-time storage.

I think if you look at the actual usage of "resource" folders, for one, you would find that this is about as generic as "files", "assets" or "documents", etc. - that's my only criticism of the existing proposal, because, by the current descriptions, just about anything could be considered "resources", but that's currently a meaningless designation, since "other folders" cover the same thing; just because many projects have a "resources" folder doesn't mean that have more in common than the folder-name.

So I'd suggest to either designate the resources folder for something more specific than "other stuff", since that's already covered by the allowance to create other folders for other stuff; either that, or omit it entirely, I'm not sure which makes more sense.

The missing designations for run-time and permanent storage are sort of a separate issue from that - that's not a criticism of the existing proposal, because it's not covered, it's something I would very much like to see added though; as said, almost any (web-) application has folders that fall into one of these two categories, whether they group these folders under a pair of parent folders or not - I believe those that don't group them under a parent folder should be, as it makes it easier to deploy and manage permissions etc in practice, though maybe that's based more on opinion than fact...

SamMousa commented 7 years ago

Add a reserved folder for run-time files, e.g. named runtime - this would be the designated root-folder for run-time generated artifacts, typically cache-files for computed data, images, log-files, temporary uploaded files, etc. This folder must be writable by the application, and all files should be considered volatile or transient, meaning you can erase the contents of this folder at any time without affecting the functionality of the application. (This is different from a system-defined "temp" folder, since e.g. log-files etc. could be important for diagnostic purposes, and since e.g. clearing the cache of a specific application may briefly affect the performance of that application.)

Definitely this. Even just specifying a recommendation for new projects would help a lot.

pmjones commented 7 years ago

Hi @SamMousa -- I think I noted this above, but if not: PDS publications are derived from and supported by common practices existing in real packages, as adopted by existing authors. As it stands, the research reveals no currently-existing common name for that kind of directory.