opencultureconsulting / openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
GNU General Public License v3.0
83 stars 19 forks source link

Is it possible to import rows to an existing OpenRefine project with this library? #12

Closed axfelix closed 4 years ago

axfelix commented 4 years ago

Sorry if this is an obvious question, but I can't quite see it from the examples. This works very well for exporting CSV from an existing project, but I can't see whether the reverse is possible, e.g., adding additional rows to an existing project via CLI. I've reviewed https://github.com/opencultureconsulting/openrefine-batch and I can't quite see it there, either...

Alternately, if this goes against OpenRefine's data model, it seems like it could be possible to automatically load a CSV into a new project following a template, and then merge it into the existing project?

felixlohmeier commented 4 years ago

Thanks for your question @axfelix, since the answer is really not obvious.

OpenRefine does not support this feature, but there were recent discussions in https://github.com/OpenRefine/OpenRefine/issues/715. There is a proposal for implementation in a comment from 2018 and an open question about the use case in a more recent comment. Maybe you can help and describe your use case there?

With the openrefine-client you can script a workaround:

  1. export existing project as csv
  2. put old and new data into a zip archive
  3. create new project by importing the zip archive

Here is an example that replaces the existing project:

  1. download example data and create project myproject

    [felix@tux Desktop]$ openrefine-client --download "https://git.io/fj5hF" --output original.csv
    Download to file original.csv complete
    [felix@tux Desktop]$ openrefine-client --download "https://git.io/fj5hF" --output new.csv
    Download to file new.csv complete
    [felix@tux Desktop]$ openrefine-client --create original.csv --projectName myproject
    id: 1733286564342
    rows: 10
  2. append rows from new.csv into project myproject

    [felix@tux Desktop]$ openrefine-client --export myproject --output old.csv
    Export to file old.csv complete
    [felix@tux Desktop]$ openrefine-client --delete myproject
    Project 1733286564342 has been successfully deleted
    [felix@tux Desktop]$ zip combined.zip old.csv new.csv
    adding: old.csv (deflated 52%)
    adding: new.csv (deflated 52%)
    [felix@tux Desktop]$ openrefine-client --create combined.zip --format csv --projectName myproject
    id: 2231810029129
    rows: 20

Note that the project id will change. I don't know a way to set the id manually.

If you want to distinguish between old and new data, you can use the additional flag includeFileSources:

[felix@tux Desktop]$ openrefine-client --create combined.zip --format csv --projectName myproject --includeFileSources true
id: 1615195201038
rows: 20

Screenshot_2020-08-19 myproject - OpenRefine

axfelix commented 4 years ago

Thank you for the detailed response! I'll close this issue and comment upstream, but I wonder if this could be added to the readme as well, as it's quite useful.

felixlohmeier commented 4 years ago

Thanks for your suggestion! I have added a chapter to the README: https://github.com/opencultureconsulting/openrefine-client#append-data-to-an-existing-project