inveniosoftware / invenio

Invenio digital library framework
https://invenio.readthedocs.io
MIT License
622 stars 291 forks source link

RFC JSON Schema-based record editor (replacement for BibEdit) #2854

Closed kaplun closed 8 years ago

kaplun commented 9 years ago

Problem

BibEdit served well in Invenio 1.x but is MARC-centric and is not designed for extensibility. Since Invenio is moving towards using jsonalchemy as its record representation (thus opening the ports for infinitely flexible record models), an hardcoded table-based web editor is no longer sufficient to natively edit Invenio records.

Proposal

Since jsonalchemy is about (inveniosoftware/jsonalchemy#9) to support JSON Schemas which is a proposed standard to describe JSON models (including validation), we could automatically generate a form that allows the cataloger to edit a given record according to the schema. This would allow the system to leverage on data models, for more and more consistent records. It happens that this is already implemented thanks to json-editor. E.g. see: http://jeremydorn.com/json-editor/.

Starting from this pre-existing editor one can build around/on top of it additional functionalities such as:

image

lnielsen commented 9 years ago

+1

Ideally, it would great if the editor could work for not only records by also documents, authors, and all the other jsonalchemy namespaces we may have.

kaplun commented 9 years ago

Yes, I guess this will come transparently if the corresponding JSONs are available through the same APIs. On the other hand I would like to foreseen into this nice previewers (for PDF/detailed record view) as this is really what is needed by a cataloguer. Not sure how to implement this in a generic way.

kaplun commented 9 years ago

Guess we might have sorts of default widgets that can be configured to be enabled upon given models.

lnielsen commented 9 years ago

I think you can simply solve it by having editors per module, as long as they can rely mostly on common code.

I agree you need the features such as detail view, pdf preview etc as you mention.

egabancho commented 9 years ago

Looks promising!

PXke commented 9 years ago

There is some really good idea, that we will perhaps at TIND implements for master like the PDF preview, record preview.

But will we not loose all of the specificity of BibEdit ? I mean what it seems to be will be like in django the administration interface that can manipulate everything.

Currently we are adding librarian specific functionnalities to BibEdit, but if the editor become multi-roles I don't see how it will be possible to adds all this functionnalities.

Perhaps did I misunderstand, any hint ?

kaplun commented 9 years ago

What are the specificities of BibEdit you think are going to miss? The proposed editor is going to adapt to the given schema, thus allowing the cataloguer to only enter valid metadata (up to the provided constraints).

Additional niceties such as shortcuts for reference extractions, PDF previewer etc. could be added in the form of simple widgets or block that are conditional on the type of document being opened...

PXke commented 9 years ago

For example we have a check which print a flash message if someone entered an ISBN which is already existing. We have hint which popup depending of the tag of fields, if autocompletion is active for this field etc... shortcut indicators, what does the MARC tags means, we have complex autocompletion on authority, if you select and authority record for autocompletion we can copy some information from the authority record etc ...

And we would like to add more things.

So it is why I would like to be sure it will not be a problem in the future if we want to add features.

kaplun commented 9 years ago

So this editor is JSON based so it will not support MARC tags and indicators. These will have sense only as part as their importing into Invenio and the corresponding exporting.

On the other hand, yes, I plan to have fancy validator being possible. This JSON-Editor is fully pluggable, so one can add local tricks such as the proposed ISBN one. Hints for sure! Autocompletion for sure.

Have a look at the demo at http://jeremydorn.com/json-editor/ .

All the missing parts that we would have to add locally are listed as subtasks of this ticket.

kaplun commented 9 years ago

Note that on the other hand, if you decide to stay fully on MARC as internal format, BibEdit is still there. AFAIK @jmartinm managed to fix the small glitches needed to make it work on Invenio 2

PXke commented 9 years ago

Okay so that is good news.

In another hand, are we agree that from the title of the RFC we are discussing of BibEdit, that will be replaced by JSON Schema-based record editor. (That I start to doubt if @jmartinm kept it alive).

And that BibEdit is manipulating records, that are composed of MARC fields, so it is a bit strange to say that it will not support them.

I looked at the demo and even if I have faith in you I am a bit worried. I don't really see how it will be able to replace BibEdit.

jirikuncar commented 9 years ago

And that BibEdit is manipulating records, that are composed of MARC fields, so it is a bit strange to say that it will not support them.

We are going to provide (almost) 1:1 mapping between MARC and JSON in https://github.com/inveniosoftware/jsonalchemy/pull/9.

I looked at the demo and even if I have faith in you I am a bit worried. I don't really see how it will be able to replace BibEdit.

Once you have a both directional mapping it's pretty straight-forward to support (almost) full MARC21 standard.

PXke commented 9 years ago

And that BibEdit is manipulating records, that are composed of MARC fields, so it is a bit strange to say that it will not support them.

We are going to provide (almost) 1:1 mapping between MARC and JSON in inveniosoftware/jsonalchemy#9.

Okay so it makes sense ! The architecture is interesting.

But from the demo I still don't really see how will be the user-case "modify a record" (let say 100__a [Marc author field]), from the moment the guy click on the edit this record to the action completed.

I mean how will the modification interfaces will be, what will be the different steps etc.

Thanks for you answers !

tiborsimko commented 9 years ago

@PXke Besides an option of relying on 1:1 mapping between MARC and JSON, let me mention another option. If cataloguers of an Invenio instance are fully living in the MARC world, editing only MARC files, and would like to keep it that way, then they wouldn't have to use this new JSON editor at all. (It would be an "internal editor" in this case.) The cataloguers could presumably use (refreshed) BibEdit or some say MS Windows editing tool that speaks MARC-in, MARC-out, and talks to Invenio via REST API, so that they would perform and upload changes that way. Just to mention this other theoretical possibility.

tiborsimko commented 9 years ago

I think you can simply solve it by having editors per module, as long as they can rely mostly on common code.

:+1: for editing each JSON namespace in this way

kaplun commented 9 years ago

But from the demo I still don't really see how will be the user-case "modify a record" (let say 100__a [Marc author field]), from the moment the guy click on the edit this record to the action completed.

So: say the cataloger reaches /record/123/jsonedit, this would open up the editor which would receive the record representation in JSON and the corresponding JSON Schema matching the model of the record (that depends on JSONAlchemy and how it will evolve). Based on JSON Schema an editor is automatically built that will allow the cataloger to manipulate the defined properties according to the defined constraints. If the cataloger wants to add a new author (s)he will have to reach for the corresponding field in the editor and click on a plus button. Of course this can be optimized and we can introduce shortcuts such as automatically jumping fields (according to the provided JSON-Schema) using some search query (e.g. as you would do in your shell with Ctrl+R).

Anyway, after having edited the JSON using the editor (and having everyfield validated client side thanks to JSON-Schema), the cataloger submits it and it's then validated sever side and uploaded.

crepererum commented 9 years ago

I start to using this approach for deposit forms in the analysis preservation project. It won't cover all the mentioned features, but the first results look very very promising. My current plan is to create a special JsonField field for the deposit, which uses the json-editor as a frontend and validates the schema in the editor (for convenience) and on the server side (for security). So if someone has questions about this technology or wants to see code in very unready+unstable state, feel free to ask.

If someone wonders: This will NOT replace normal deposit forms. They are always better than auto-generated ones. But we expect to have many different forms for many different data catalogues.

kaplun commented 9 years ago

Hi @crepererum, will you attack at the same time some of the points listed in the beginning of this RFC? (e.g. autocompletion from KBs)

crepererum commented 9 years ago

@kaplun that's low priority, because I don't need that for the deposition part.

crepererum commented 9 years ago

Some updates for you guys: I started to stress-test the json-editor using a record with >3000 authors.

Let's start with the bad news: Chrome on my desktop machine (proper i7 + 8GB + SSD) needs ~35seconds to generate the form. During this time, the site is unusable. On Firefox the situation is even worse. After I went through the json-editor source code I think that it is totally unusable for this task. The data and update handling is way to inefficient (e.g. O(n^2) when processing arrays).

The good news: I managed to render the form in <3sec using good old plain JavaScript. And there is room for improvement (e.g. use elements p instead of input and generate the input fields only on click/access, because input fields seem to be very heavy food for the browser). There are many features missing (e.g. add/sort array elements, change types, collapsing) but I'm already able to retrieve JSON from the form. And the data handling looks good so far. For validation I would try https://github.com/geraintluff/tv4 + lightweight local validation (so you get errors directly when editing fields).

Btw: I won't use clusterize.js because on my browserS it is slower than a rendered form and is very unsmooth when scrolling. I think up-to-date browers are will work here very well when we are smart enough to provide them the right (KISS-)input.

CC @kaplun @egabancho

kaplun commented 9 years ago

Uh oh! I was fearing that. Regarding clusterize I think there is an other similar javascript trick (@eamonnmag I never remember its name...) Is it really impossible to improve the situation? Because basically it would mean giving up on generating forms from schemas...

kaplun commented 9 years ago

Nevermind, what @eamonnmag was talking me about was precisely clusterize.

kaplun commented 9 years ago

One side solution, is simply that we exclude author fields when they are larger than N, as anyway a JS editor is probably not the right tool to manipulate them.

eamonnmag commented 9 years ago

Indeed. @crepererum, I was looking at the performance levels for input as well recently. One possible change, if you wanted to keep input like features would be to use the editable content feature of HTML5 (but this requires an up to date browser) - this is pretty fast. I wrote a test here with 3000 generated elements http://jsfiddle.net/xL7c8ke7/ - but using a p tag is probably best for support of older browsers.

crepererum commented 9 years ago

@kaplun when the form IS generated, scrolling feels very smooth. So you don't get anything of this hack (which is basically the try to be more efficient in JS than your browser when it comes to viewport calculation, which sounds ridiculous to me). Also I think clusterize only works with an all-eqal-height-row table layout. The generation itself could be improved (as I mentioned).

Getting the form from schemas (=json-editor) instead of the JSON (=status now) should not be a problem but I need some time to implement this. The problem with json-editor is not the schema-handling, but the coding style in general, a suboptimal "just always update+signal+check everything" (works fine for small records, sure) and a not-that-cooperative maintainer (just have a look at all the open and uncommented PRs).

crepererum commented 9 years ago

@eamonnmag I thought about that one as well. I will evaluate both. My current idea is: use p plus click-listener (and some others) to detect when the user wants to write into the specific field. If so, we generate an input field which gets removed when the focus (of the field, not the window) is lost. That assumes that the browser handles the amount of listeners well (which I think is the case, at least better than all the input fields).

eamonnmag commented 9 years ago

@crepererum you only need one listener on the class of the element, then you can assert the id and build the input from that. e.g. http://jsfiddle.net/hs27btp6/

crepererum commented 9 years ago

@eamonnmag This is the reason why I do NOT use jquery, because it hides a very bad fact here:

$(".author").bind("click", function () {
    alert(this.id);
})

equals (under the hood)

var _f = function () {
    alert(this.id);
};
var _all = document.getElementsByClassName('author');
for (var _i = 0; _i < _all.length; ++_i) {
  _all[_i].addEventListener('click', _f.bind(_all[_i]));
}

Event listeners have to be registered on specific DOM elements. You could in fact register one listener on something like form and then use the mouse coords to figure out which element was clicked. As far as I remember their is a nice and performant and browser-driven way to get elements under the mouse cursor.

eamonnmag commented 9 years ago

Ah, sorry, what I meant to do was attach that event to the parent, so http://jsfiddle.net/hs27btp6/1/ - That way you have one listener and perform the logic. screen shot 2015-06-24 at 22 42 22 screen shot 2015-06-24 at 22 42 49

Also what is interesting, if you test in the JSfiddles, is that the addition of CSS really slows things down a lot on rendering, much more than attaching events etc. So adding 3000 items to the DOM is quick, but the time taken for the browser to also consider the CSS and render that is fairly large. I guess in your tests so far, you've only been adding the p elements without CSS styling?

crepererum commented 9 years ago

You could in fact register one listener on something like form and then use the mouse coords to figure out which element was clicked.

==

Ah, sorry, what I meant to do was attach that event to the parent, [...] That way you have one listener and perform the logic.

:wink:

Also what is interesting, if you test in the JSfiddles, is that the addition of CSS really slows things down a lot on rendering, much more than attaching events etc. So adding 3000 items to the DOM is quick, but the time taken for the browser to also consider the CSS and render that is fairly large. I guess in your tests so far, you've only been adding the p elements without CSS styling?

I only add classes to the elements and let the CSS doing its job. I don't know you specific implementation, but:

  1. Never add style-data to the elements directly (only if it is really required, which is very very rare)
  2. Never alter an element after you've attached it to the parent.
  3. Do not build the DOM (here the form stuff) while your target element (e.g. <div id="jsonform">) is attached to the document/main DOM. Instead use a detached target and attach the final assembled result to the main DOM. Otherwise you're just stresstesting your brower's reflow performance. (btw: this is the reason why mathjax performs so badly on bigger pages, esp. compared to KaTeX).
  4. Do use jQuery to parse all that data. I mean, why spending additional resources when you know which browser APIs you want to call in the end? And especially do no concatenate strings just to get them processed by another engine (jQuery) which then parses them and sets the final results to the browser. No offence, but this style is exactly what makes browser apps slow.

I'll do another perf test later. Btw: jsfiddle turns out to be very bad when it comes to this task, because they're including Facebook, Google, Kickstarter?! and other code in their page which heavily interferes with the submitted code.

eamonnmag commented 9 years ago

jsFiddle does load a lot of crap, but I'm performing the same task on the same platform. Just adding one CSS class (in a CSS file) has a big effect on the time taken to paint the element. Bear in mind that each set of input boxes for each author is styled by many classes in the CSS...when you add your p elements, the CSS is likely to be much less complicated. I've never spent much time before checking the performance of all these elements, it's a very informative process!

crepererum commented 9 years ago

Using p instead of input cuts my time down to 1.5secs. And surprise: editable does not influence the time that much :open_mouth: BUT: We need to parse the result (because of all the div and br elements that represent newlines in editable elements) to get the final result back. So that might speed up initial render performance but might kill the time we need to calculate the result JSON. 'element.innerText' might be a solution but is said to be slow :disappointed:

Update: mmh, seems to work. I get the JSON result in less then 300msec. So no problem. :smile:

PS: everything is shown at the moment, so collapsing will speed up this even more. So I think we are good to go.

eamonnmag commented 9 years ago

Cool :)

jirikuncar commented 8 years ago

Follow https://github.com/inveniosoftware/invenio-records-js

kaplun commented 8 years ago

I realize it was only mentioned IRL but never formally on Github, that at INSPIRE we are also working on a Jsonschema editor as we promised in the past. The aim of this editor is to replace BibEdit from Invenio 1, hence cataloguer oriented.

Internal project description is available at: https://docs.google.com/document/d/1y_v4bqACdMCpd8XWwTw8UswX278hSP0Gbj8CVwZhpno/edit?usp=sharing and prototypes: https://github.com/inspirehep/record-editor