k-int / gokb-phase1

Original GOKb repo - Moving to https://github.com/openlibraryenvironment/gokb
http://www.gokb.org
Other
11 stars 5 forks source link

Ingest: Reuse of captured rules #448

Closed kristenwilson closed 8 years ago

kristenwilson commented 9 years ago

Needs to cover both user selected reuse in Refine and auto-ingest reuse (perhaps via TSV).

Will likely overlap with #333. Spec is probably required.

sosguthorpe commented 8 years ago

@kristenwilson / @ostephens Can either/both of you clarify what is expected here as part of 6.0? Issue #333 suggests that the automation and suggestion would be as part of 6.x and I'm looking for some clarification of what this issue means in respect to this specific release.

ostephens commented 8 years ago

@sosguthorpe @kristenwilson @ianibbo Apologies for the lengthy reply:

In the original proposal around macros and reuse I made 12 proposals. Out of these I think the following have been implemented:

We will not implement P5-P8 as these were an alternative approach to what we implemented under P3 and P4.

This leaves:

P12 was always an optional extra, so I think we can forget this for now (and if it is desired raise it as a separate work stream).

This leaves P2, P9 and P10. In the original specification I had the idea that the way this would work would be:

Having discussed this with @sosguthorpe, and some of the comments from @ianibbo regarding the way automating uploads would work I think now....

Which means the work for 6.0 looks something like:

  1. Develop/implement mechanism to automatically apply OpenRefine transformations where this can be deduced based on existing rules/well known scenarios
    • Automated application of transformations should happen when a file is checked in for the first time in OpenRefine, and could also be applied in the automated processing
  2. Develop/implement a way of applying the automatic OpenRefine transformations, and any manually specified Macros, to a file that is going through the automated ingest process
  3. Develop/implement mechanism to handle situations where the automated ingest process does not manage to ingest the file and further work needs to be done before the file can be successfully ingested

However - it may be that I'm off base here depending on what @sosguthorpe and @ianibbo are hoping to do regarding the automated ingest process. But that's my best shot :)

sosguthorpe commented 8 years ago

Thanks for this Owen. This is definitely not a small undertaking and I think this should be moved to a future release under the 6 major version number. I believe we should focus on the other issues under the 6.0 milestone to get as many of those in as we can for the next release.

kristenwilson commented 8 years ago

Thanks, Owen and Steve. I think Owen's proposal sounds good in general. I will put this on our agenda for Friday to see if there's any additional discussion needed.

Steve, we had agreed to break the 6.X work down into two releases -- 6.0 and 6.1. I took an initial stab at sorting issues into the 6.0 release, but Ian had said that you two would look at it and move things around based on time, dependencies, etc. So feel free to keep doing that if it makes sense to you. We will also check in on this on Friday.

sosguthorpe commented 8 years ago

OK. No problem.

On 02/12/15 16:03, kristenwilson wrote:

Thanks, Owen and Steve. I think Owen's proposal sounds good in general. I will put this on our agenda for Friday to see if there's any additional discussion needed.

Steve, we had agreed to break the 6.X work down into two releases -- 6.0 and 6.1. I took an initial stab at sorting issues into the 6.0 release, but Ian had said that you two would look at it and move things around based on time, dependencies, etc. So feel free to keep doing that if it makes sense to you. We will also check in on this on Friday.

— Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/448#issuecomment-161345668.

kristenwilson commented 8 years ago

Closing this and adding a new issue for the rules work for the bulk data loader.