inveniosoftware-attic / invenio-circulation-legacy

https://invenio-circulation.readthedocs.io
GNU General Public License v2.0
4 stars 8 forks source link

RFC: data structure #4

Open mvesper opened 9 years ago

mvesper commented 9 years ago

The circulation module is going to be rewritten from scratch in Invenio 2, making it more flexible and covering more library use cases. The purpose of this RFC is to discuss current development stages of the data structure.

The currently demanded functionality seems to work with the following entities:

    +-----------------+                                                   
+-->|Record           |                                                   
|   +-----------------+                                                   
|   |Won't change this|                                                   
|   +-----------------+                                                   
|                                                                         
|                                                                         
|                                                                         
|   +--------------+          +--------------+            +--------------+
|   |Item          |   +----->|User/Librarian|<-+    +--->|Library       |
|   +--------------+   |      +--------------+  |    |    +--------------+
+---+record        |   |      |id            |  |    |    |id            |
    |id            |   |      |name          |  |    |    |name          |
    |barcode       |   |      |rights        |  |    |    |location      |
    |location      |   |      |loan_condition|  |    |    |loan_condition|
    |loan_condition|   |      +--------------+  |    |    +--------------+
    |status        |   |                        |    |                    
    |description   |   |                        |    |                    
    |notes         |   |                        |    |                    
    +--------------+   |                        |    |                    
               ^ ^     |                        |    |                    
               | |     |                        |    |                    
               | |     |                        |    |                    
    +-------+  | |     |      +-------+         |    |                    
    |Request|  | |     |      |Event  |         |    |                    
    +-------+  | |     |      +-------+         |    |                    
    |item   +--+ +------------+item   |         |    |                    
    |user   +----------+      |user   +---------+    |                    
    |status |<----------------+request|              |                    
    +-------+                 |library+--------------+                    
                              |date   |                                  
                              |action |                                  
                              +-------+                                  

The class members are dummies and just there to provide a general idea about what kind of information each entity should carry. The precise database tables definitions will evolve over time.

Reasoning:

The general approach did not change, but now there is an idea how to store the data. After a discussion with @jalavik there is the following idea:

Instead of defining a database schema or class definition that tries to handle all different requirements (for example different locations, item details, barcodes) and naming conventions, the models basically carry two attributes: id and data (naming is open to discussion). The data attribute contains a string version of the object's attributes (something like the dict attribute). In order to make the individual values searchable, Elasticsearch will be used for indexing.

This basically means that MySQL and Elasticsearch are required in order to make this approach work, but, since the upcoming modular approach of Invenio would allow to just not install the circulation module if it is not needed, it should be ok.

Criticism and ideas are highly encouraged :)

tbasaglia commented 9 years ago

Sorry, I know that the class members are dummies, however:

"no other entity needs to keep track of their history": I think so. However, what about records of paper books, for which we add the link to the ebook? In a sense, we add an item, so this should also be recorded as a status change in the history, even if it is basically a modification of a MARC field. So, if we decide that MARC should generate item information and not vice versa (decision to be taken!), modifications of 876 and 852 field should also trigger en event and status change (='item added').

tiborsimko commented 9 years ago

Instead of defining a database schema or class definition that tries to handle all different requirements (for example different locations, item details, barcodes) and naming conventions, the models basically carry two attributes: id and data (naming is open to discussion).

Architecturally, I'd say there are two extreme approaches: (1) introduce separate table column for each new property; (2) introduce only one "data" column and store every attribute there e.g. in a serialised JSON. The former seems close to the original proposal, the latter seems close to the updated proposal.

As it often happens in life, what about finding a middle road? By studying all the various use cases for circulation, we could extract common attributes (e.g. status) and create columns for them, all the while maintaining additional attributes (e.g. front page colour, or whatever an installation may want to store) in a free blob. Advantage: one could easily profit from SQL relational constraints (and SQL queries) for non-ID attributes, too. (Example: item's status column value would be always good, ensured by foreign key constraint.)

This basically means that MySQL and Elasticsearch are required in order to make this approach work

Not necessarily; PostgreSQL allows to combine relational and JSON data very efficiently. In this way we could profit from both worlds, having (structured) RDBM data for some attributes, and additional (free) JSON data for other attributes, all in one system.