inveniosoftware / invenio

Invenio digital library framework
https://invenio.readthedocs.io
MIT License
626 stars 292 forks source link

RFC Invenio "no strings attached" #1806

Closed greut closed 10 years ago

greut commented 10 years ago

tl;dr: invenio should be a library (with optional and external modules)

Currently invenio is a rather complicated to install software into which an overlay is installed. As long as the overlay is minimalist it’s just fine but you quickly start hitting some walls (and duplications). Here is an attempt to make the overlay the first class-citizen and let invenio be a simple library.

I'll describe how the documentation could look like in this new paradigm.

How to install invenio 2.0

All you need is pip install invenio. If it's nice, it doesn’t do much as long as no applications configure or use it. It's time to create your first application using invenio (we also call them overlay).

Creating an application using invenio

Jump into a new directory, ideally within a new virtualenv but that's up to you, really. And create a setup.py file which is the description file for a python package (the same you find on the cheeseshop (aka PyPI)).

from setuptools import setup

setup(
    name='My Overlay',
    version='0.1a0',
    url='http://www.example.org',
    author='John Doe',
    author_email='john.doe@example.org',
    description='Sample application for invenio',
    install_requires=[
        'Invenio>=2',
    ],
    entry_points={
        'invenio.config': [
            'myapp = myapp.config'
        ]
    }
)

Then install it:

$ pip install -e .

To use a more specific version of Invenio, it's recommended to specify it via a requirements.txt file this way. Here, we want to use the pu branch.

-e git://github.com/inveniosoftware/invenio@pu#egg=Invenio

-e .

And then install it:

$ pip install -r requirements.txt

Both are identical regarding the overlay. The first one will pick invenio from PyPI while the second one will use the pu branch from Gitbub. The bleeding edge as they say.

Configuration

As you’ve seen above, we defined an entry_point for invenio.config. It points to a module that will contain our configuration. So create your application.

/myapp/
|     |- __init__.py
|     `- config.py
|
|- requirements.txt
`- setup.py

Put the required configuration into config.py.

CFG_SITE_LANGS = ["en"]

CFG_SITE_NAME = "My Overlay"
CFG_SITE_NAME_INTL = {
    "en": "My Overlay"
}

PACKAGES = [
    "myapp.base",
    "myapp.modules.*",
    "invenio.modules.*",
]

Sensitive configuration

Other configuration elements like database username and password or the website url should not be put here as this file is not specific to the installation and may be put under a version control system such as Git or Subversion.

The configuration can be handled via the inveniomanage command line interface (or by editing the invenio.cfg file in the instance folder and reloading the application).

# boilerplate
$ inveniomanage config set create secret-key
# MySQL configuration
$ inveniomanage config set CFG_DATABASE_NAME mysql-database
$ inveniomanage config set CFG_DATABASE_USER mysql-user
$ inveniomanage config set CFG_DATABASE_PASS mysql-password
# HOST configuration (for redirects, etc.)
$ inveniomanage config set CFG_SITE_URL http://0.0.0.0:4000/
$ inveniomanage config set CFG_SITE_SECURE_URL https://0.0.0.0:4000/

The base module

The myapp.base module is the module in which you’ll be able to override the invenio layout (templates), add views, etc. Let’s create a minimal one.

/myapp/base/
|     |    |- __init__.py
|     |    |- static
|     |    |- templates
|     |    `- views.py
|     |
|     |- __init__.py
|     `- config.py`
|
|- requirements.txt
`- setup.py

static and templates are empty directories for now. Let’s register our views with a new Blueprint.

from flask import Blueprint

blueprint = Blueprint('myapp', __name__, url_prefix="/",
                      template_forlder="templates", static_folder="static")

Et voilà!

JavaScript and CSS

Invenio comes with HTML templates, JavaScript and CSS files that are ready to use. Some of them are using some open source libraries that aren’t yet installed. For that we recommend to use bower and start with the bower.json files provided by Invenio.

NOTE the bower.json file from invenio is the minimum set of requirements to use the default user interface of Invenio. We recommend you to built atop of it.

First thing first, bower is needed. It’s a package management tool that can be downloaded using npm another package management tool that can also be downloaded using your favourite package management tool such as apt or yum. So meta, such package management.

$ # Recommended way for Ubuntu LTS
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:chris-lea/node.js
$ sudo apt-get update
$ sudo apt-get install nodejs
$ sudo su -c "npm install -g bower"

Invenio uses a little tool called Flask-Collect to grab all the static files used by every modules and copy them into one safe place. That is important because we want to put the downloaded assets into such a place, to make things easy.

By convention, we are using /vendors for all the external libraries. By default bower downloads into a folder called bower_components. What we want here is to download into myapp/base/static/vendors instead. This way when the collection is down, those file will get copied over and be accessible from the /vendors prefix. To do so, create a file called .bowerrc at the top level of the project (next to setup.py)

{
    "directory": "myapp/base/static/vendors"
}

Now, let's download all the assets from bower and collect them into the instance directory.

$ bower install
$ inveniomanage collect

And magically, the vendors directory gets updated.

TIP: don't put the vendors directory into version control.

The assets will be preprocessed (for LESS or SASS), merged and even minified (clean-css or uglifyjs) by webassets which requires those external tools.

$ sudo su -c "npm install -g clean-css less uglifyjs requirejs"

All set!

Installing the database

$ inveniomanage database --init --user=root --password=<ROOT> --yes-i-know
$ inveniomanage database create

Go for it.

$ inveniomanage runserver

Why is it better?

As soon as you start making something big with your overlay, you end up redoing a Gruntfile and bower.json (see 1. and 2.) or something else if it floats your boat. The CDS overlay (see 3.) is using this alternative way.

  1. https://github.com/inspirehep/inspire-next/tree/new-templates
  2. https://github.com/zenodo/zenodo
  3. https://github.com/greut/cds-demosite/tree/prototype

This way invites you to use Bower, but you can remove it totally if you want to tweak the web assets used by the bundles (currently not easy to do).

As an invenio user all that matters should be your overlay, and deploying it should be as easy as, Deploying with Fabric:

$ fab pack deploy

How modular can Invenio be?

Everything in invenio.modules is (potentially) a module. Most of the module could (and probably should) live in their own repository and be part of the dependencies (install_requires) of the overlay.

lnielsen commented 10 years ago

Thanks for the write up. In general I agree, so just some few specific comments:

Sensitive configuration:

The inveniomanage route will work for development. As soon as you are in load balanced environment, I suggest creating an extra package with the instance specific configuration (pws + hosts etc). This allow you to deploy invenio, app and conf but just installing three Python packages.

Deployment:

Is a topic on it's own and likely need it's own ticket. I have a solution for Zenodo that we can build on. Problems/experiences that I've encountered so far are:

Also, on a related note we are looking into running Invenio on an OpenShift cluster. Here, the main blockers is Invenio's use of the local file system which requires refactoring to scale well.

greut commented 10 years ago

pip install is too slow? static file problems on deploy

Don't you create a new virtualenv and change a symlink once everything is done? (aka the yolo-no-rollback mode)

BTW I don't want to focus on the deployment with this RFC, I'm simply stating it should be as easy as possibly described in some standard WSGI application documentation.

lnielsen commented 10 years ago

Don't you create a new virtualenv and change a symlink once everything is done? (aka the yolo-no-rollback mode)

That would solve host down time but build time would be very long plus prone pypi downtime/changes. Anyway, new ticket needed :-)

greut commented 10 years ago

Enable some cache for the downtime, redundant downloads and changes should be handled by your CI server.

pip offers a –download-cache option for installs to prevent redundant downloads of archives from PyPI. https://pip.pypa.io/en/latest/reference/pip_install.html#download-cache

Also YOLO-mode when you're deploying stuff is usually bad, especially on a Friday evening.

greut commented 10 years ago

One step in this path, Invenio-Communities as a standalone module.

And it's working (but the exthook is still hackish)

jirikuncar commented 10 years ago
  • The downside is that the documentation will be fragmented.

We can add git submodules for "external" module or just link their docs in sphinx from http://invenio.rtfd.org .

tiborsimko commented 10 years ago

@greut Further clearer separation of Invenio components, going up to using standalone repositories for Invenio modules, is something we discussed more than a year ago when autotools in "next" was definitely abandoned. It would be a very logical step along the Invenio-as-a-framework path.

However, there are things to ponder, such as: are all modules first-class citizens? Are some kept as core? If so, which ones? Would a digital repository software still make sense if there is no, say, upload facility provided by default? Aren't ranking and indexing and searching mutually interdependent? How would further factorisation into, say, fifty standalone GitHub projects enlarge ticket workflow and integration matrix complexities? Last but not least, aren't users using Invenio mostly as an integrated platform anyway? These kinds of thoughts.

We decided to pause the musings at the time in order to concentrate on finishing up Invenio 2, to take technology stack revamp in manageable chunks. Now that we are approaching the stable release of Invenio 2, maybe the time has come to revisit this topic once more. However, we still don't have Invenio 2 stack fully stabilised yet, all our major sites are still running on Invenio 1, etc. So I'd say let's re-discuss but let's keep any major effort still paused, rather concentrating on finishing up the first things first.

Personally, I see further outward "componentification" as a natural process that will organically happen after we have Invenio 2 out and running. For each module, we could start by creating "provides" and "requires" catalogue of versioned APIs that this or that concrete module offers to, or consumes from, these or those other modules. Such an effort would be very useful to do regardless of whether modules happen to live in the same central repository or in numerous separate ones. Once component dependencies are clearly separated in such a way, the disentangling process could advance further organically, up to creating multiple separate repositories if we want; and we'd call the final result Invenio 3.

greut commented 10 years ago

Currently, from my point of view, it's not clear what prevents people (Tind, CDS, ...) to use next. What remains to be done before it's good enough for public usage should be accessible somewhere (wiki, whatever... github issues may not be the best way to do that)

I may be wrong, but I have the feeling that the only way you'll quickly (aka, in a reasonable time) achieve a stable invenio core is by descoping. Move some modules away and say explicitly: we are currently not supporting those modules (experimental, deprecated, buggy, ...), feel free to use them but you may have to fix them first.

This way, you can focus on invenio core bugs and features and leave the cool stuff (like trackbacks :wink:) to the people that are using them.

greut commented 10 years ago

Yesterday evening during my way home, this made me think of how I deal with my “stuff”.

I have too much stuff and I'm very bad at getting rid of some most of it. So I'm going through a strange proces that is ongoing for years. When I'm cleaning up my room, I put some stuff into bags or boxes. When I've got too much bags or boxes in here, I move them to the attic. Eventually the attic gets full or I'm moving to a new flat. Then, I've got two choices: move all that stuff to the new attic or realize that I've never used them for the last decade and that I should get ride of them. (Doing that when you have boxes of boxes of bags of stuff can take weeks to figure out what is what)

I know, by the time I put a stuff in a bag or a box that it'll eventually move to the attic and get thrown away as garbage. Out of sight, out of mind.

But I cannot help it, I feel like I own all that stuff, may use them in the future (usually not) and that probably they define who I am. Instead of putting my stuff into the attic, I should put in on the street with a big “help yourself!” label. Mark Pilgrim is right in its “Pursuit of Happiness”, and I know it.

Mark Pilgrim: the pursuit of happiness

Code is not the same of the usual stuff, it's doesn't weight anything or needs a room for it. It's even worse than that. Code is a living thing that will byte you if you don't take care of it.

Each time you say yes to a feature, you're adopting a child. You have to take your baby through a whole chain of events (e.g. design, implementation, testing, etc.). And once that feature's out there, you're stuck with it. Just try to take a released feature away from customers and see how pissed off they get. https://gettingreal.37signals.com/ch05_Start_With_No.php

How to deal with code?

jirikuncar commented 10 years ago

https://github.com/inveniosoftware/invenio/issues/1806#issuecomment-46951703 +1 @jirikuncar

kaplun commented 10 years ago

@greut Don't you buy the same thing twice or thrice because you forgot it was packed in a box and, worse, you can't find that box? :-)

How to best apply what you say given the particular type of our community, where requests comes and goes, as the students that implements them? Sometimes you can't really commit to maintain something in the short-term, but plan to do so in the long term.

tiborsimko commented 10 years ago

@greut Concerning your earlier remark about descoping in order to release Invenio 2 quickly, I'd say you don't have to separate components away in order to put out stable Invenio 2 release. The old modules work via legacy compatibility layer relatively well already. (And, provided the old modules' templates are switched from home-grown CSS to Twitter Bootstrap CSS elements, the old modules could even integrate smoothly to the new look and feel.)

We wrote the compatibility layer precisely to enable Invenio sites to upgrade to "next" rapidly. By rapidly I mean two things: (1) without having to wait until maintainers of modules M1, M3 and M7 (that site S3 desperately needs) to find time to refactor the code properly to upgrade to the "next" technology stack; (2) without forcing S3 to rewrite their own custom service code to use new modules (think WebSubmit to Deposit).

In my eyes, to separate modules away at this time would most probably lead to slower Invenio 2 release date, simply because it takes a lot of time to separate components away and to refactor them properly. Note that many of these old-style components are really core, such as uploader, indexer, you name it, so that we cannot release Invenio 2 without them.

Let me mention an analogy from the Android world. If you use Android, you may have noticed that Google's Email app has always been bundled together with the Android OS release itself. It was not possible to upgrade Email app without upgrading the Android version itself. It is only very recently (IIRC last month) that Google separated away Email app from Android OS and released it on Google Play store as a standalone app (com.google.android.email). One could wonder why it took so long (4+ years) to do so? Was it not obvious from the start that it would be better for the app to live standalone? Well, I'd say this is probably yet another example of real life in action. "So much to do, so little time".

tiborsimko commented 10 years ago

@greut @jirikuncar Concerning your later remark about (a) removing code rather than disabling it and about (b) not accepting features if not willing to maintain them.

WRT former, all Invenio features F1, F2, F3, etc came to Invenio because some big services S1, S2, S3, etc needed them at some point. The maintainership behind Fi usually changes as manpower behind Si changes as a function of time. The features get rarely removed, but it does happen via standard on-demand RFC process, i.e. sending an email about intent of altering F7 and waiting for reaction to see if some service Si still relies on F7. If nobody needs linkbacks anymore, then it can get deprecated and later removed completely, no problem.

WRT later, well this is a delicate matter of finding good balance of when to say "yes" and when to say "no". In theory, only code that is well written, following recommended practices, and accompanied by a plethora of unit and functional tests gets committed. In practice, the theory is rarely followed. Feature Fi is usually developed by students working either in the core team or on service Sj that has its own service-driven deadlines and needs. The MMM estimates the price to pay for writing reusable code about threefold. Is service Sj willing to pay this price for generalising feature Fi if nobody else uses it yet?

Say a new feature request comes, and the code is not very nicely generalised. There are two extreme reactions possible: (0%) "yes please commit this new cool code even though it seems broken thank you very much"; (100%) "nope it cannot go in like that so just go back and generalise it from the get go". Our goal is to find good balanced barrier, depending on common vision and consensus and wishes and possibilities and resources and deadlines between various code contributing teams. Leaving threshold too low would lead to S7 breaking S1 and S3 installations, which is clearly not acceptable. (And it is our core integrators' job to prevent this from happening via QA inherent in pull-on-demand collaboration model; a kind of service payed back to Si contributors.) Raising the threshold too high would lead to S3 turning away and fully developing F11 on their own, not bothering sharing with others, and perhaps S8 would end up doing something similar, each in their own corner. It makes sense to "nurture" commonly needed features via common code base rather than saying "no" from the get go.

We can commonly decide on where the barrier should be, and progressively raising it higher and higher; but the theory and practice will always differ, as we are composed of loosely coupled, diverse, heterogeneous distributed teams.

greut commented 10 years ago

If we take the trackback example (#1707). It's disabled by default (on the backend but strangly enabled on the frontend), it has bugs (the empty response is one of them, Zenodo bursts into flames when you try to send one), no documentations. It looks like a box from my attic gathering dust.

[...] well this is a delicate matter of finding good balance of when to say "yes" and when to say "no". In theory, only code that is well written, following recommended practices, and accompanied by a plethora of unit and functional tests gets committed. In practice, the theory is rarely followed.

It's because you have no choices, no middle ground to put experimental modules that end up polluting your dependencies, test suites, ... Invenio is a big block; you're either in or out. As a customer, the only configuration choice you have is to not include the modules you don't want.

Pardon my pessimism, but I'll be starting to believe that Invenio 2.0 is around the corner when a working admin interface will be present (or maybe /admin is not the good place to look for).

Regarding Android, Google is moving all the open source app (Android's) to a closed source model. Nothing good there. The ASOP application com.android.email is still part of android itself but receive close to no updates.

...the company's main method here is to bring more and more apps under the closed source "Google" umbrella. http://arstechnica.com/gadgets/2013/10/googles-iron-grip-on-android-controlling-open-source-by-any-means-necessary/

tiborsimko commented 10 years ago

WRT admin interfaces and Invenio/next, yes /admin/bibindex/bibindexadmin.py appears to work but clicking on something breaks things... Which is even more a reason for working on improving compatibility layer right now (Invenio 2), rather than starting to muse on factoring modules away (Invenio 3). To use go's terminology, urgent points before big points.

greut commented 10 years ago

WRT admin interfaces and Invenio/next, yes /admin/bibindex/bibindexadmin.py appears to work but clicking on something breaks things...

At first, I was like: not bad, and then I scrolled down. EDIT the cds demosite wasn't doing the proper pagefooteradd | safe

Factoring modules away is only about not having to do any work on them as of now. That's a pragmatic approach asking the question, what is the business value/cost of: fixing a module for 2.0 vs factoring it away.

The less costly approach being to disable a module and put a warning in it. That way, you're probably giving it less chances that someone will come along and fix it for you though.

tiborsimko commented 10 years ago

I think I gave my answer to the pragmatic question above already. You cannot disable the indexer, since Invenio would not make sense without it. You can rewrite its admin interfaces in Flask and Jinja and friends, but that would take time. You can think of factoring out the module away together with search and amend all its clients, but that would take even more time. Or you can simply improve the legacy compatibility layer and the bibindex admin UI pages would work again the way they work in Invenio 1. Which is OK because the module itself is largely the same still. This would permit to release Invenio 2 the quickest.

greut commented 10 years ago

bibindex does fall into the category of the modules that must be part of invenio core features, at any costs. There is nothing to gain.

kaplun commented 10 years ago

@tiborsimko I guess @greut is talking about living out all the non-core and not ported modules of Invenio. I don't know exactly what their status of them are, but we can take e.g. linkback, elmsubmit, etc. surely not submit, bibindex, bibupload...

@greut, @tiborsimko is trying to tell you that by fixing the legacy layer, there is no longer a need to move away or rather port core and non-core modules for Invenio 2.0

greut commented 10 years ago

@kaplun thanks, we got a bit lost here.

I'm talking about, and am only interested in, enabling a layer between invenio core and the many overlays to come. Invenio core should be a PyPI library and let the overlay do all the complex stuff.

E.g. Zenodo could release its github module as a standalone one. The key goal (behind all that very interesting noise) is enabling module plugging, not removing everything from the entire invenio.modules directory. In that scenario, with that feature, you could consider removing some modules from the core set of modules of Invenio.

Then the possibilities are endless for Invenio customers: CDS, Inspire, Rero, EPFL, Tind, ...

In practice

Let's look at the CDS overlay dependency tree. They decided to use communities, annotations, cloud and the elastic search.

It will be different than the one from INSPIRE that does need neither communities or annotations, but the workflows and linkbacks.

Conclusion

At the end of the day, you, @tiborsimko, will decide if, when and how you want to enable a way to plug modules in Invenio or not. If you do so, you'll have to show the way.

Thanks for the discussion, I'll not add anything.

jirikuncar commented 10 years ago

Thank you @greut, @tiborsimko, @kaplun and @lnielsen-cern for the valuable input.

I would like to summarize my thoughts from this discussion. Once the #1863 is ready we can release 2.0rc0 and test the core features and legacy layer. Next step would be to moving currently disabled modules to separate packages as they are already clearly separated from the rest of Invenio (2.X). Later we can muse module by module or extension by extension and decide if it should live inside core invenio package or it should follow Flask way of approved modules/extensions.

Idea: Overlay generator
tiborsimko commented 10 years ago

@kaplun Your summary of what I was trying to say could be actually read in a misleading way, so let me home in on that. If one wants a concise summary of my thoughts on this topic, then I mentioned it already in my very first reply in this thread:

Personally, I see further outward "componentification" as a natural process that will organically happen after we have Invenio 2 out and running. For each module, we could start by creating "provides" and "requires" catalogue of versioned APIs that this or that concrete module offers to, or consumes from, these or those other modules. Such an effort would be very useful to do regardless of whether modules happen to live in the same central repository or in numerous separate ones. Once component dependencies are clearly separated in such a way, the disentangling process could advance further organically, up to creating multiple separate repositories if we want; and we'd call the final result Invenio 3.

IOW, I largely agree with @greut on the general direction where we are going, even though we may disagree on scale, scope, or timeline. The factorisation is a time consuming process, we have only limited resources, and we should carefully balance our priorities in these Invenio 2 pre-release times. My stance was not to delay release-oriented matters any further by working on factoring away invenio-linkbacks and friends. This can happen, and as I mentioned above I think it will happen very naturally and very organically, as we move on. Starting possibly in autumn/winter already.

@greut I was happy to reply you further on various theoretical "don't accept feature X" and "what is business value of Y" and "/admin interfaces are not yet ready" side topics that you brought, which might perhaps obfuscated the picture. If this is the case, just reread my very first reply in this thread.

@jirikuncar Yes, such a timeline expresses well my feelings about urgent points before big points. Once we have record API and document API and JSON store and asset generation and Jinja-template-blocks and other should-not-be-changing-too-much-anymore things nailed down, we can move on with the Invenio 2.0.0 release that is long overdue. We can always advance with further "componentification" along the way as we go down the 2.x line towards 3.0, taking the work in manageable chunks.

@jirikuncar +1 for the overlay generator helper. As discussed on Monday inveniomanage could be enriched with various wizards aiding people managing their sites, perhaps up to pre-creating overlay file schema. E.g. for multi-server set up and sharding we thought of creating helpers like inveniocfg --clone-node in the Invenio 1 past.

jirikuncar commented 10 years ago