symphonycms / symphonycms

This is the official Symphony CMS repository.
https://www.getsymphony.com
MIT License
546 stars 212 forks source link

Improve Extension Metadata #767

Closed brendo closed 12 years ago

brendo commented 13 years ago

Largely from this thread, using this definition as an example. The long term goal is to allow for more meta rich information about extensions, including compatibility among versions, dependencies and stronger ties to the Symphony website.

At the very least Symphony 2.3 should include the ability for an extension to mark compatibility, type and dependencies (whether it's done with XML or through the existing about() functionality)

designermonkey commented 12 years ago

This was discussed again at Symposium, and we we're discussing doing this in PHP, as it is currently done.

That being said, this is the method we are going to follow. As the new lead of the Symphonists (extension cleanup and maintenance), this will probably be my only rigid-decision-without-further-discussion.

We need to decide a timeline to implement this behaviour. Implementing this will also help us with extension deprecation, and the decision as to whether to take certain extensions over that are neglected, as this change will force extension developers into activity to apply the change as it will break backwards compatibility.

Once a game plan has been decide, I, and the team will update the Symphony/Symphonists extensions, and propagate this through the community.

One note on the backwards compatibility though: We need to introduce this between two versions to cater for both behaviours, (we know this, just wanted to get it written down).

brendo commented 12 years ago

Ok I've been thinking about this a lot lately as it's a bit of a blocker for a number of things.

The first step is easy, just add some data into the about() array and slowly people will catch on but I think the thing we are missing is why extension developers should do this. This definition is fine for a starting point. The big thing we'll have to do is document that and publish it so it's a standard that people can refer to.

Some initial ideas/thoughts/questions for 'what we can do with this data':

Compatibility

Dependencies

Types

Git Repo

designermonkey commented 12 years ago

Thanks for your input @brendo.

I will just elaborate a little more on the discussion (I should have done so really)

Compatibility

Dependencies

Extension dependency was discussed, with some against it completely, and some in favour. I don't think it's a bad thing, as long as it is done correctly, and with that, we need to get the 'best practices' written up and agreed (my todo).

Types

We should start doing this now though, to prepare for the new site. I don't however think they should be called types, instead tags which is more fitting to what they actually are.

If we don't start doing things now, then we will have to go through everything again and I for one would like to get as much out of the way now.

Git Repo

In the long run, yes there will be this functionality in the core, but is a way off (I'd like to push to help @brendo get the issue tracker list down a little), but as my previous statement, it will help to get into the habit of adding the repo location and version history now rather than later.

Ok I've been thinking about this a lot lately as it's a bit of a blocker for a number of things.

  • I think the quickest way to introduce this through the current about method (yes I know I've previously said otherwise).
  • I think we just need to 'do it' instead of talking about it, we won't think of all the right types first time, so lets get them 'about right' and then refine.

Is there no way to reinforce the current methods while I get the new instructions written up? There are still some blockers that haven't happened in all extensions, namely the $this->Parent deprecation. I see this as progress to find out which extensions are being actively maintained, and which ones we need to take control over. The extension list is excessive, and as we don't have any usage stats, we need some 'breakage' to find out the usage/updating status of them all.

I'm all in favour of the 'just get it done' attitude, we need pro-action rather than more discussion, but I think we should 'just do it' in XML with a back up of the current method for at least one 2.x version number to allow time for adoption.

brendo commented 12 years ago
designermonkey commented 12 years ago

Ok then. This will go on hold until I can write up some docs/best-practice. I'm just worried that the data is less accessible in PHP than XML, which was raised before.

Lets see what $this->Parent breaks first ;o)

nickdunn commented 12 years ago

I'd dearly love to make this XML. This ties in nicely with rebuilding the Symphony site, since it means updating an extension's meta data is a case of parsing an XML file from Github, rather than the slightly dodgy task of tokenising or evaling PHP.

There's a pretty good set of examples on the forum thread. We could omit dependencies and types for now if it will make life easier — there's no point in adding it until we actually use it. But I think compatibility information is important, since this is how the XML idea came about in the first place.

The biggest benefit of XML is that we can validate it. To retain quality metadata we can say "your extension must include at least a name, description, release date, version and repository link" and we can enforce it too.

To this end, my suggestion would be to go with the XML route. Could Symphony's core can look for, and parse this, and fall back to an about() array if the XML does not exist? Compatibility would then be implemented from the XML, but not from the about() method. We can then say that for Symphony 2.4 or 2.5 the XML file is mandatory. Extensions that do not provided therefore become deprecated/incompatible by their absence of a file. We'd either show a warning to the user when they try and install ("This extension might not be compatible, it was last updated 2 years ago") or even prevent it being installed altogether (depending on how far down the line 2.4 or 2.5 actually is).

To get this ball rolling we should re-read the XML metadata discussion on the forum, see Craig and my sample gists, then start adding it to our own extensions to try it.

(As an aside, to speed up the process, I had the idea of an XML generator we can get developers using to make the transition easier. Give it a repo URL and it will do its best to generate a meta file.)

nickdunn commented 12 years ago

I've updated the gist with the comments from the forum thread, as well as comments left on the gists themselves.

https://gist.github.com/1114078

designermonkey commented 12 years ago

I'm happy with that XML, and I think we should continue. If there are any 'semantics' that need discussion as to the layout, they can be discussed after we get this working. They will only be tweaks at that point, and we need to forego holding this up any longer.

Questions to ask ourselves now:

Any more questions from anyone else?

brendo commented 12 years ago

Cool, thanks Nick!

So, I've been thinking about XML metadata and one of the first thing's we'll need is a similar class to Configuration that helps read XML files and allows access via a similar API. The current Configuration class could probably be used to a degree, perhaps just adding another method that does the initial read and parse of the XML file?

It wouldn't be accessed via Symphony::Configuration() though, instead a new Configuration class would be created as needed. eg. new Configuration('path/to/extensionmetadata').

Whatcha think?

designermonkey commented 12 years ago

I was just about to mention that the Configuration class doesn't read XML, but then re-read what you said. Excellent idea. How do we want to parse the XML, SimpleXML or DOMDocument?

Is there any reason we'd need to initialise it that way? Is that because Symphony::Configuration is pre-loaded with core stuff? I'm still learning all this.

nickdunn commented 12 years ago

When should the deadline be

I'd like to get the most basic part of this into 2.3 if we can. That is, allowing a developer to remove the about() method from their extension driver and Symphony uses the XML instead. This means the XML will need to be agreed upon sufficiently for extension name, description, author details and release details (replicating about()).

What do we expect the workload to be to implement this

Relatively minimal I think. Presently the extension manager lets you call about() on an extension, so we need to find where this occurs and write a different accessor for this meta data. It's kind of beyond me, so might be in Brendan's task list I'm afraid.

one of the first thing's we'll need is a similar class to Configuration that helps read XML files and allows access via a similar API

It's not really core Configuration, and is read-only too, so I don't think the same configuration accesso principles apply.

Wouldn't this just be another internal function inside the ExtensionManager or Extension class? If I call the extension's about() method, if it has one it will return it (backwards compatibility), otherwise it would parse the XML file and return that object instead.

Eventually it will need methods for things like:

But right now I think we should replicate the about() method and go from there. This lays the groundwork for a lot of other great things (potentially updating in the core, and maintaining meta data on the Symphony site are the two big things), but in the short term I don't think we should be changing the way Symphony core deals with extension meta data. In my opinion this should be a like-for-like swap.

Quite how you decide to do this... I'm not too fussed!

brendo commented 12 years ago

Wouldn't this just be another internal function inside the ExtensionManager or Extension class? If I call the extension's about() method, if it has one it will return it (backwards compatibility), otherwise it would parse the XML file and return that object instead.

It definitely could be, I was just thinking more low level that a Configuration class that reads XML may prove useful in the future rather than keeping it tied into ExtensionManager.

But right now I think we should replicate the about() method and go from there.

Agree, I'll start this weekend.

nickdunn commented 12 years ago

Oh, and a couple of things discussed above that I haven't addressed:

nickdunn commented 12 years ago

I was just thinking more low level that a Configuration class that reads XML may prove useful in the future rather than keeping it tied into ExtensionManager

You are one step ahead of me. Sounds like a fair idea.

designermonkey commented 12 years ago

I was going to suggest tags actually, as they should be tags on the website rather than categories... An extension developer could feasibly tag his/her extension for what it does and affects. This Git/Github confusion also backs up this thought too.

The XML schema should be changed to reflect this now, so we can keep the xml-to-aim-for consistent from the start.

nickdunn commented 12 years ago

This Git/Github confusion also backs up this thought too

Sorry dude, I don't follow, doesn't it do the opposite? I thought that having tags (for types/categories) in the XML might be confusing, since a developer is used to assigning a git tag when marking a release. Tags implies a sense of freedom, whereas in #715 we've discussed a controlled vocabulary.

The controlled vocab gives us an element of control as to how extensions are described and categorised.

designermonkey commented 12 years ago

Sorry, yes. I'm in a difficult place right now personally, and getting confused. Ignore me.

nickdunn commented 12 years ago

No probs! So tags, types or categories?

allen commented 12 years ago

Tags is unfortunately an existing term used for Git, so as Nick pointed out, it would be best to stay clear of it. Also as Nick pointed out, tags and types imply different things.

Personally, types and categories are equal in terms of definition. However I'd vote to keep types since it is already an existing term used within Symphony.

Regarding XML for extension metadata, I'm all for it. The symphony website can offer an API to look up any extension given certain filter conditions and return the extension's XML data. This would be most useful for an extension listing and compatibility check kind of extension.

nickdunn commented 12 years ago

In lieu of this, an extension's ID would be its folder name. Symphony (well, file systems) require these to be unique, so they can be considered the unique reference. It means we can use the folder name for dependency management e.g.

<dependency version="0.5">search_index</dependency>

And also for looking up extensions on the website e.g.

http://symphony-cms.com/downloads/extensions/search_index

(No more entry IDs in URLs, yay!)

The downside here is that you can't release two extensions of the same name. So if we had a released extension, another developer could not release a fork without changing the name. This is probably a good thing!

Therefore the XML now includes the extension ID on the root element:

https://gist.github.com/1114078/334135f2858d5b35b64cd5a6c8793e3b290106f8

designermonkey commented 12 years ago

It's a go for me.

nickdunn commented 12 years ago

Woop! In which case I'll assign a new task to Mr Chang to write an awesomesauce XSD schema to validate the XML, and I'll start work on a converter to aid transition and testing.

designermonkey commented 12 years ago

If Allen is too busy, I can do the XSD.

nickdunn commented 12 years ago

Ooh, another XSD ninja in the house. Fight?

On a serious note... yes please. Let me know when you might get a chance and we can quickly talk it over. I don't know how powerful XSD is so not sure what level of conformance we can stipulate.

remie commented 12 years ago

Does this improvement also includes the possibility of having an online extension repository where this metadata is stored? Currently, the Symphony website does not really allow you to programmatically list all available extensions, dependencies, versions etc.

I'm making this comment because I'm working on my developer network application, and it would be great if I can provide developers with a list of available extensions and automatically install them with a single click.

nickdunn commented 12 years ago

Currently, the Symphony website does not really allow you to programmatically list all available extensions, dependencies, versions etc.

The operative word being currently. We haven't begun to flesh out the details, but the proposition is to make this more accessible. I started writing Symphony fields that interface with the Github API. They manifest themselves as simple text input fields into which you add a Github user profile URL or repo URL. It does the rest (caching all sorts of tags, contributors etc).

The idea is to expose both extensions and Symphony releases via an XML (and/or JSON?) API, so that other applications and indeed extensions, can consume it. This includes applications such as yours, plus the possibility of an extension that allows in-Symphony extension discovery, updating and installation.

So the idea is for this to happen, but when is another thing entirely. Not this year. We're still laying the groundwork.

designermonkey commented 12 years ago

By the way, the XSD is nearly done.

nickdunn commented 12 years ago

Some kind of awesome.

designermonkey commented 12 years ago

The xml and xsd to validate said xml is finalised here: https://gist.github.com/1268374

Yes. This is some kind of awesome.

designermonkey commented 12 years ago

Oh, I forgot to mention. The regex for email and url validations will need thorough testing. XML Schema regex is a limited little puppy and nowhere near as tough as posix regex, so I have had to wangle it a little. It currently validates the urls provided in the example xml, but can be updated as and when needed.

brendo commented 12 years ago

Gonna start on the XML reading tonight :)

designermonkey commented 12 years ago

Great! Going to just point you to (this comment)[http://symphony-cms.com/discuss/thread/34727/5/#position-86] from @nickdunn on the forum. As there's been no response, I'm going to suggest that we just make the change. It definitely makes more sense to do it this way.

nickdunn commented 12 years ago

Which change?

simoneeconomo commented 12 years ago

Sorry for asking, but I'm kind of confused by the different ways we are going to treat extension metadata and... logs, as an example.

Which are the reason behind sticking to XML instead or JSON, and viceversa, apart from XML supporting namespaces (and being verbose) and all the useless debates "XML vs. JSON"? I mean, is it a mere matter of personal tastes?

I don't want to sound critical, but it looks like there's no more a standard way to describe data in Symphony, as it was some time ago (e.g. Symphony 3 using XML for storing sections data). I mean, why don't we just use a format and stick to it so to set a standard (I'm not talking about datasources and events being converted into XML, in that case the fact of using XSLT as template system is the reason).

Sorry again if this sounds provoking or harsh, I can assure you it's just a genuine question.

simoneeconomo commented 12 years ago

Note: I'm not saying that I prefer $a over $b, 'cause I like both formats. I just want to understand which is the Symphony way for describing data. For instance, jQuery and MooTools are both great JS libraries, but we decided to use jQuery. What about XML and JSON?

designermonkey commented 12 years ago

@nickdunn, the change you proposed in the comment I linked to. The current Schema is very different when it comes to release compatibility nodes.

@eKoeS I think that for something like this, it is easier to manually write the data into XML than JSON, and it is human readable for developers and users alike, it has an easy structure. When it comes to log data, IMO there will be so much data that you wouldn't want to even read it manually, and JSON has a smaller footprint being only a text string. It would be more for parsing and display by another system. It makes sense IMO to use each for the required benefits of each.

simoneeconomo commented 12 years ago

@designermonkey: Thanks for your explanation. So it's kind of a matter of readability and data volume, right? I mean: we stick to XML for human-readable data (that is, when we expect human to read that data in that format) or small-sized data; we stick to JSON for machines (that is, when we expect machine to read and parse that data once more) or for huge blocks of data. Correct?

nickdunn commented 12 years ago

@nickdunn, the change you proposed in the comment I linked to. The current Schema is very different when it comes to release compatibility nodes.

Oh I see. Ok. Yes, I've just assumed that no news is good news and people are accepting of the idea. I've already implemented this in a load of extension.meta.xml files ;-)

Which are the reason behind sticking to XML instead or JSON, and viceversa

I don't think there needs to be a standard for "all Symphony". If the logs want to store as JSON, they should. Just like the JIT whitelist stores as plain text. We chose XML because it makes the most sense for developers: we write in XML all day long (HTML and XSLT), and it's human readable. JSON doesn't make sense here as it's quite a "fiddly" syntax, both to write by hand and to read by eye.

It should be viewed on a case by case basis I think.

allen commented 12 years ago

Which are the reason behind sticking to XML instead or JSON, and viceversa

Plus, the XML format can be validated against XSD, which John has skilfully written. This helps with ensuring the extension's meta format is always valid and compatible before it's released. It's possible to do assertions with JSON data too, but it's an area that is beyond the field of an XML-centric system like Symphony. Naturally in this particular case, XML makes sense.

designermonkey commented 12 years ago

Ah, ok, didn't realise.

I will have to change the XSD then. @brendo if you want to start, and I'll get the XSD and final XML Schema written out tonight. It makes sense to keep it in the one Gist until were published.

@eKoeS it's just my experience that large data chunks work beta in JSON, especially if it's only going to be parsed into an app and is not needed for people to read. Although it wasn't me that decided ;)

nickdunn commented 12 years ago

@designermonkey, I have something cooking that isn't ready for public consumption yet, which concerns the XSD. Will email you and Brendan off-list.

designermonkey commented 12 years ago

Aw, shucks, thanks Guru @allen

@nickdunn, no probs

simoneeconomo commented 12 years ago

Many thanks @allen, @nickdunn, @designermonkey.

As long as there's an objective reason that goes beyond personal tastes, I think it's reasonable to use different formats. Otherwise we would end up having data stored in a number of different formats just because one "likes" $format-1 over $other-formats, which in my opinion would be confusing. That's why I used the word "standard": at least for the core, I like reading and knowing that we use XML for extension metadata because we want them to be validated, and JSON for logs because it's kind of easier to parse in this context. That's it, I was just confused. Sorry for going OT.

nils-werner commented 12 years ago

In one of the first comments, Brendo mentioned that the metadata schema might change if we see the need for a new feature. Thus, to make things easier to maintain I'd suggest versioning the metadata schema as well.

Something like https://gist.github.com/1303729 maybe?

Since I didn't instantly find the right context to put the versioninfo in I simply created a new root node metadata. Who knows, maybe one day we'll want to save more metadata besides just extensions.

What do you think?

designermonkey commented 12 years ago

That's a good idea. +1

brendo commented 12 years ago

Heh, guess I shouldn't of buried my head in code for the last hour.

Anyway, loading the extension.meta.xml is done.

If it exists, it'll be loaded by ExtensionManager::about() and will pull the minimum into the return array (name, version, release data, authors). If the file doesn't exist, it'll fall back to the current behaviour.

There is an additional paramater that can be passed to the about() function, $rawXML, which (if it's found) will return the extension.meta.xml file in a DOMDocument object. This is open to change as we explore what we actually want to do with our newfound data - or if we choose to do nothing until 2.4 when it is required.

edit Yep as @nils-werner mentioned there'll need to be some sort of versioned schema so that we can say things like, 'Symphony 8 will require all extensions to conform to the Extensions 3.0 Schema'. It'd be really cool (read: required) to create a page on the Symphony site that describes the schema, and each of the options as well (even if it embeds the gist as a guide).

And the current implementation doesn't validate the XML against an XSD, that's fairly trivial to add at any time from memory

brendo commented 12 years ago

Just added a quick check that will look at the @min attribute and compare it to the current version of Symphony. If it's greater than the installed version, the table row will show 'Requires Symphony @min'. This won't prevent Symphony from installing the extension, but at the least the developer will have a heads up that they are venturing into uncharted waters. We can definitely improve on this or change it up as we see fit, I was just keen to do something :D

What next for this issue?

nickdunn commented 12 years ago

XML namespaces make sense here, to disambiguate between elements if they change.

https://gist.github.com/1304004

<extension id="search_index" xmlns="http://symphony-cms.com/extensions/1.0">

I've got a blog article already written, just needs a bit of tweaking for the current discussion.

Once we're done, I'd like to work through the XSD with @designermonkey because I want to understand it better :-)

designermonkey commented 12 years ago

Happy to finalise the XSD. Schema validation is doable in PHP, but the function only returns true or false. There are methods to get around this, and display which line has failed etc, which I will have to discuss with you @brendo, as I won't know how to implement this validation stuff in the core.

@nickdunn, uh namespaces :( I havent tested how they interact with validation, will have to do that too.

brendo commented 12 years ago

For the moment we can just use schemaValidate and simply return boolean. I like the idea of creating a Lint Tool that would be more descriptive (think JSONLint or CSSLint)