inveniosoftware / invenio-records

Invenio-Records is a metadata storage module.
https://invenio-records.readthedocs.io
MIT License
10 stars 69 forks source link

cli: removal of `input_type` argument for `create` command #50

Closed jirikuncar closed 9 years ago

jirikuncar commented 9 years ago

Proposal

Use UNIX philosophy and combine utilities as follows:

$ wget http://demo.invenio-software.org/search?of=xm -O | \
dojson -t dojson.contrib.marc21:marc21 | inveniomanage records create 

cc @lnielsen @nharraud

lnielsen commented 9 years ago

:+1: I guess both works:

inveniomanage records create and inveniomanage records create <file>

(i.e. file not specified means read from stdout

egabancho commented 9 years ago

does this mean that the convert_marcxml function would go somewhere else? DoJSON? overlay?

jirikuncar commented 9 years ago

@lnielsen yes, but no conversion allowed in records module.

inveniomanage records create [-f/--file [STDIN]]
jirikuncar commented 9 years ago

@egabancho I would add it to DoJSON package.

dojson/setup.py

'entry_points': {
    'dojson.converter': [
        'marc21xml = dojson.contrib.marc21:marc21',
        'marc21_authorityxml = dojson.contrib.marc21:marc21_authority',
    ]
}
nharraud commented 9 years ago

I took a look at the command's code. If I understand correctly the change is mainly forcing input_type to json. I'm ok with that.

Regarding the command example I'm not sure to understand how it works. What does wget http://demo.invenio-software.org/search?of=xm -O output? Just the metadata or also other info like the database ID and the schema reference? If so, it seems that dojson will have to go search for the metadata inside a json object before transforming it. If not, how do you keep the same ID when exporting and reimporting? It would be great if it was possible so that the links remain valid.

nharraud commented 9 years ago

Sorry didn't think about trying it myself.

nharraud commented 9 years ago

For MARC21 it works because there is a field for the "control number". I'm wondering what will happen for future versions, once the records won't be bounded to marc21. How do we interpret them? Do we configure somewhere which json path contains the id? Or will there always be a way to convert them to marc21?

nharraud commented 9 years ago

The solution I would see is to separate invenio's metadata from the record's metadata and have a format similar to elasticsearch batch. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) There would always be two json object one after the other. The first contains Invenio's metadata and the second contains record's metadata.

lnielsen commented 9 years ago

Fixed AFAIK?