rufuspollock-okfn / reconcile-csv

A simple OpenRefine reconciliation service that runs on top of a CSV file
BSD 2-Clause "Simplified" License
117 stars 28 forks source link

Can't run reconcile-csv.jar file #9

Open santanu4 opened 10 years ago

santanu4 commented 10 years ago

I am a newbie i data reconciliation with open refine and I am still in learner stage. I am using windows XP SP3 and I have java run time environemnt installed in my machine. My question is where do I run the command:- java -jar reconcile-csv-1.0.1-SNAPSHOT-standalone.jar <column with id's>

I have tried running it in windows command prompt and I used different prmutationsa dn combinations to see if any runs. I have also tried double clicking and running the reconcile-csv.jar file. It threw a hava virtual machine error and eventually didn't proceed. I want to know how to make it run?

mihi-tr commented 10 years ago

Sorry for the late reply - extensive offline vacations here - The command prompt would be the right spot to run it. Don't forget to pass the right parameters to reconcile-csv. What is the result of your command prompt experiment?

meriteadrupal commented 9 years ago

Hello!

My apologies in advance for my newbie-ness. The reconcile-cvs instructions breeze over things that likely are obvious to seasoned data wranglers. But they aren't (yet) obvious to me. Like the member above, I'm am also having trouble starting the reconcile-csv server using the .jar file.

A few questions:

I am on the MacBook platform and in entering commands in terminal. Do I need to navigate to the same directory as the .jar file to run the command prompt to launch the server?

Where should I put the .jar file so that it works? (Should I put it in a "Reconcile-CSV" folder along with the other reconcile services in the Extensions folder. I'm referring to the directory you can see when you click on the "Browse workspace directory" link at the bottom of the OpenRefine home page when the app is running.)

where to put reconcile-csv

Where should I put the .csv file to which it refers? Does it belong in the same directory as the .jar file?

Do I need to navigate to that folder in Terminal to execute the command to start?

When I enter the code into Terminal to start the server (java -jar reconcile-csv-0.1.1.jar ), how should the CSV-File name be entered? I have a file called InventorsList.csv. Does it need the suffix ".csv"? Do the values need quotes around the names? I've tried with and without quotes, with and without .csv and still get the following error:

error message

Last, more a big-picture concept thing so that I can wrap my brain around this -- how does this work exactly?

As an example: I have a spreadsheet of new patent inventor records (Spreadsheet A) that contains a column with the full name of an inventor. I have an export of contacts in my database (Spreadsheet B). I want to figure out what inventors we already have in our database as Contacts. I then need to append Contact IDs, etc to those records to link inventor records to the contact records in my database (2 different objects). Our unique identifier is the full names of people, the closest thing we could get to a unique identifier.

My plan is to bring in the inventor Spreadsheet A into OpenRefine to reconcile on the column called Name against Spreadsheet B. My understanding is that Spreadsheet B would be the .csv file that I would list in the .jar code. In this scenario, the "Search Column" would be the "Name" column in Spreadsheet B and the "ID column" would be the "Name" column in the project we created from Spreadsheet A. Am I understanding this correctly? In that example would I enter the following to start the server:

java -jar reconcile-csv-0.1.1.jar

I welcome your insights and corrections . . .again, my apologies if any of this seems obvious. I'm relatively technical, but am still coming up to speed on command prompt tasks.

Many thanks! Krista

mihi-tr commented 9 years ago

@meriteadrupal,

Thanks for the long description of what you're trying to do.

To run, java needs to know where exactly the .jar file is you are trying to run. So you can either run

`java -jar reconcile-csv-0.1.1.jar if the jar is in the same folder or

java -jar /path/to/reconcile-csv.jar if the jar is in a different folder. The same for the CSV file.

Don't put the pointy brackets in - run e.g.

java -jar reconcile-csv.jar spreadsheetB.csv "name" "Contact ID"

reconcile-csv just works with the spreadsheet you are giving and exposes a reconciliation API towards Refine - the reconciliation itself will happen in Refine.

This means: The CSV you give to it is your database export - the "Search Column" would be the name and the "ID column" would be the column with the database ID.

Hope that helps.

santanu4 commented 9 years ago

Hello Mr. Bauer, I have been able to run and add the reconcile-csv add-on to open refine. However,  I have a few more questions in my mind. Reconcile-csv compares 2 csv datasets, Now! I want to know whether the file I am loading inside open refine should be the same as the file I have mentioned in the command prompt while running reconcile-csv jar file? I got slightly lost as to how reconcile-csv is going to reference another file from inside open refine to do that fuzzy matching between 2 files. i tried googling it but there is very limited docs on the net and tht too not very explainatory.   thanking you, yours sincerely,                           Santanu Chatterjee

 On Saturday, 27 September 2014 3:17 PM, Michael Bauer <notifications@github.com> wrote:

@meriteadrupal,Thanks for the long description of what you're trying to do.To run, java needs to know where exactly the .jar file is you are trying to run. So you can either run`java -jar reconcile-csv-0.1.1.jar if the jar is in the same folder or java -jar /path/to/reconcile-csv.jar if the jar is in a different folder. The same for the CSV file.Don't put the pointy brackets in - run e.g.java -jar reconcile-csv.jar spreadsheetB.csv "name" "Contact ID"reconcile-csv just works with the spreadsheet you are giving and exposes a reconciliation API towards Refine - the reconciliation itself will happen in Refine.This means: The CSV you give to it is your database export - the "Search Column" would be the name and the "ID column" would be the column with the database ID.Hope that helps.— Reply to this email directly or view it on GitHub.

mihi-tr commented 9 years ago

Hi there,

You will generally open one file (the one with the unique-IDs) in reconcile-csv and the other (the one where you want to introduce the IDs in refine.

Hope that helps.

intellerati commented 9 years ago

Hello Michael,

I got the .jar file running, thanks to your clarifications. Thank you so much. So we've made progress. However, I am getting a new error that I hope you can help me resolve:

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar inventorslist.csv “name” “Contact ID” Exception in thread "main" clojure.lang.ArityException: Wrong number of args (4) passed to: core$-main at clojure.lang.AFn.throwArity(AFn.java:437) at clojure.lang.AFn.invoke(AFn.java:51) at clojure.lang.AFn.applyToHelper(AFn.java:172) at clojure.lang.AFn.applyTo(AFn.java:151) at reconcile_csv.core.main(Unknown Source)

In troubleshooting, I noticed instruction that I needed to add the reconciliation service and thought that might have something to do with the issue above. So I went to Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

I look forward to your thoughts.

Thank you! Krista

mihi-tr commented 9 years ago

Hi there,

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar inventorslist.csv “name” “Contact ID” Exception in thread "main" clojure.lang.ArityException: Wrong number of args (4) passed to: core$-main at clojure.lang.AFn.throwArity(AFn.java:437) at clojure.lang.AFn.invoke(AFn.java:51) at clojure.lang.AFn.applyToHelper(AFn.java:172) at clojure.lang.AFn.applyTo(AFn.java:151) at reconcile_csv.core.main(Unknown Source)

This looks wierd - are the dobule quotes the same as you posted? If yes - try to replace them with ordinary double quotes (or single quotes).

intellerati commented 9 years ago

OK. The following produces no error. So I think we're making progress . . .

java -jar reconcile-csv-0.1.1.jar inventorslist.csv ‘name’ ‘Contact ID’

Just the cursor is returned, which i presume means it is working. Here, I'm unclear on what to do next.

Your instructions say:

Then add http://localhost:8000/reconcile as a reconciliation service to refine. You can add more columns through the reconcile-interface in Refine.

Then use cell.recon.match.id

to get the ID from the match.

What I did:

I went to the Name column I want to reconcile in OpenRefine. In the dropdown I selected Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It doesn't seem to save or show up in list of services. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

Am I doing something wrong?

mihi-tr commented 9 years ago

OK. The following produces no error. So I think we're making progress . . .

java -jar reconcile-csv-0.1.1.jar inventorslist.csv ‘name’ ‘Contact ID’

Just the cursor is returned, which i presume means it is working. Here, I'm unclear on what to do next.

Your instructions say:

Then add http://localhost:8000/reconcile as a reconciliation service to refine. You can add more columns through the reconcile-interface in Refine.

Then use cell.recon.match.id

to get the ID from the match.

What I did:

I went to the Name column I want to reconcile in OpenRefine. In the dropdown I selected Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It doesn't seem to save or show up in list of services. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

Am I doing something wrong?

The cursor shouldn't be returned from the call - The program should stay running. This is strange.

Data Diva | skype: mihi_tr | @mihi_tr Open Knowledge | School of Data http://okfn.org | http://schoolofdata.org GPG/PGP key: http://tentacleriot.eu/mihi.asc

intellerati commented 9 years ago

I tried again and I misspoke. The cursor with the $ was not returned . . . just the rectangular square part of the cursor. So I presume it loaded. I still get the time-out error. FWIW.

intellerati commented 9 years ago

So i have managed to get Reconcile-csv to run in java. (Yes!) I went into OpenRefine. Selected "Start Reconciling" on the column "linkedin". ( I am using the URL as a unique identifier.) I went to Add Standard Service and entered http://localhost:8000/reconcile.

It took me to this screen. It showed as "working" but I never got beyond that point.

reconcileworking

In checking Terminal, it showed an exception error:

ExecutionException: java.lang.NullPointerException (See text below.)

Also, as I reviewed instructions --- there seems to be two sets of instructions -- I don't understand the difference between the two. One set of instructions involves the reconcile-csv-0.1.0-SNAPSHOT-standalone.jar file and the other features the reconcile-csv-0.1.1.jar. If it isn't an imposition could you fill me in on the difference? (I presume the first "stands alone" -- it isn't entirely obvious to me what that means and I don't see that detailed in the documentation.)

Also, i've seen instructions to point to http://localhost:8000/reconcile and to just http://localhost:8000/ I've tried both -- http://localhost:8000/reconcile is the one that gets me to the "working" screen.

(I hope you find my fumblings a wee bit entertaining . . .)

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar LinkedInCheck-standardizedURLs.csv ‘LinkedInStandardized’ ‘linkedin’Starting CSV Reconciliation service Point refine to http://localhost:8000 as reconciliation service 2014-10-09 15:57:48.550:INFO:oejs.Server:jetty-7.x.y-SNAPSHOT 2014-10-09 15:57:48.579:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:8000 2014-10-09 15:58:08.619:WARN:oejs.AbstractHttpConnection:/reconcile java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at clojure.core$deref_future.invoke(core.clj:2108) at clojure.core$future_call$reify6267.deref(core.clj:6308) at clojure.core$deref.invoke(core.clj:2128) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:67) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$zipmap.invoke(core.clj:2713) at reconcile_csv.core$reconcile_params.invoke(core.clj:131) at reconcile_csv.core$reconcile.invoke(core.clj:141) at reconcile_csv.core$fn66.invoke(core.clj:212) at compojure.core$make_route$fn__528.invoke(core.clj:94) at compojure.core$if_route$fn516.invoke(core.clj:40) at compojure.core$if_method$fn509.invoke(core.clj:25) at compojure.core$routing$fn534.invoke(core.clj:107) at clojure.core$some.invoke(core.clj:2443) at compojure.core$routing.doInvoke(core.clj:107) at clojure.lang.RestFn.applyTo(RestFn.java:139) at clojure.core$apply.invoke(core.clj:619) at compojure.core$routes$fn538.invoke(core.clj:112) at ring.middleware.keyword_params$wrap_keyword_params$fn1335.invoke(keyword_params.clj:32) at ring.middleware.nested_params$wrap_nested_params$fn1377.invoke(nested_params.clj:70) at ring.middleware.params$wrap_params$fn__199.invoke(params.clj:58) at ring.adapter.jetty$proxy_handler$fn75.invoke(jetty.clj:18) at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle(Unknown Source) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:931) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:992) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.NullPointerException at fuzzy_string.core$bigrams.invoke(core.clj:9) at clojure.lang.AFn.applyToHelper(AFn.java:161) at clojure.lang.AFn.applyTo(AFn.java:151) at clojure.core$apply.invoke(core.clj:617) at clojure.core$memoize$fn5049.doInvoke(core.clj:5735) at clojure.lang.RestFn.invoke(RestFn.java:408) at fuzzy_string.core$dice.invoke(core.clj:26) at reconcile_csv.core$score$fuzzy_match31.invoke(core.clj:78) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core.protocols$seq_reduce.invoke(protocols.clj:26) at clojure.core.protocols$fn6026.invoke(protocols.clj:53) at clojure.core.protocols$fn5979$G59745992.invoke(protocols.clj:13) at clojure.core$reduce.invoke(core.clj:6175) at reconcile_csv.core$score.invoke(core.clj:80) at clojure.lang.AFn.applyToHelper(AFn.java:163) at clojure.lang.AFn.applyTo(AFn.java:151) at clojure.core$apply.invoke(core.clj:619) at clojure.core$partial$fn4190.doInvoke(core.clj:2396) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$sort.invoke(core.clj:2752) at clojure.core$sort_by.invoke(core.clj:2769) at clojure.core$sort_by.invoke(core.clj:2767) at reconcile_csv.core$scores.invoke(core.clj:112) at reconcile_csv.core$reconcile_param.invoke(core.clj:124) at clojure.core$pmap$fn6275$fn6276.invoke(core.clj:6354) at clojure.core$binding_conveyor_fn$fn4107.invoke(core.clj:1836) at clojure.lang.AFn.call(AFn.java:18) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695)

My apologies in advance for the multiple posts. My hope is that there will be information in our discussion that will help other users.

Regards, Krista

mihi-tr commented 9 years ago

Krista,

First the two different names are simply two different versions of reconcile-csv.

Nevertheless, It's hard to say what the error is you are encountering. There seems to be something happening within the data that makes it barf.

Could you share your .csv file? Did you make sure there are no empty cells?

sjg-transparency commented 9 years ago

I'm very much in the same position as the original poster - I have very little experience of doing anything in command prompt so I'm probably doing something very obviously wrong, but I have absolutely no clue what.

I've download the file and stored it in FolderX. The Data I want to use as my reconciliation file i.e. the one with the IDs is in Folder Y. I've tried putting the below into the command prompt but it keeps on saying "Unable to access jarfile P:\FolderX\reconcile-csv-0.1.2.jar

java -jar "P:\FolderX\reconcile-csv-0.1.2.jar" P\FolderY\HouseOfLords-Donations-Reconciliation.csv "Name" "PersonID"

Am I missing something or is my computer being buggy? If I can get this going ASAP it'll save me a load of time with what I'm trying to do, so any swift assistance on this would be greatly appreciated.

mihi-tr commented 9 years ago

Interesting issue, toe command line seems ok - if your path is correct. Can you dir P:\FolderX\reconcile-csv-0.1.2.jar . Also I do think the double quotes should not be needed...

sjg-transparency commented 9 years ago

Think I'm getting somewhere. It now seems to be having problems finding the file I want to reconcile from. You might have to spell things out very simply for me if I'm doing something wrong because I have no idea how this language works.

Thanks for your assistance

C:\Users\Steve>java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc ile-csv-0.1.2.jar" "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\HouseOfLords -Donations-Reconciliation.csv" Name PersonID Exception in thread "main" java.io.FileNotFoundException: P:_Shared files exc D SP\PROJECTS\Lobbying\Data\HouseOfLords-Donations-Reconciliation.csv (The system cannot find the file specified) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(Unknown Source) at java.io.FileInputStream.(Unknown Source) at clojure.java.io$fn8638.invoke(io.clj:233) at clojure.java.io$fn8577$G85428584.invoke(io.clj:73) at clojure.java.io$fn8650.invoke(io.clj:262) at clojure.java.io$fn8577$G85428584.invoke(io.clj:73) at clojure.java.io$fn8612.invoke(io.clj:169) at clojure.java.io$fn8551$G85468558.invoke(io.clj:73) at clojure.java.io$reader.doInvoke(io.clj:106) at clojure.lang.RestFn.invoke(RestFn.java:410) at clojure.lang.AFn.applyToHelper(AFn.java:161) at clojure.lang.RestFn.applyTo(RestFn.java:132) at clojure.core$apply.invoke(core.clj:619) at clojure.core$slurp.doInvoke(core.clj:6278) at clojure.lang.RestFn.invoke(RestFn.java:410) at reconcile_csv.core$_main$fn__2676.invoke(core.clj:238) at clojure.lang.Atom.swap(Atom.java:51) at clojure.core$swapBANG.invoke(core.clj:2161) at reconcile_csv.core$_main.invoke(core.clj:238) at clojure.lang.AFn.applyToHelper(AFn.java:167) at clojure.lang.AFn.applyTo(AFn.java:151) at reconcile_csv.core.main(Unknown Source)

mihi-tr commented 9 years ago

I could be your folder names with a ton of spaces in them - not sure it handles those well. Could you move your files to the same folder? Or type cd "THEFOLDERWITHYOURFILE" and just run: java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc ile-csv-0.1.2.jar" thefile.csv name personID

sjg-transparency commented 8 years ago

Morning Michael,

Sorry for the very late reply. In the end I had to do this the long hard way, but I’ll try your solution the next time I need to reconcile some data.

Thanks for your time Steve

From: Michael Bauer [mailto:notifications@github.com] Sent: 12 August 2015 20:54 To: okfn/reconcile-csv Cc: Steve Goodrich Subject: Re: [reconcile-csv] Can't run reconcile-csv.jar file (#9)

I could be your folder names with a ton of spaces in them - not sure it handles those well. Could you move your files to the same folder? Or type cd "THEFOLDERWITHYOURFILE" and just run: java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc ile-csv-0.1.2.jar" thefile.csv name personID

— Reply to this email directly or view it on GitHubhttps://github.com/okfn/reconcile-csv/issues/9#issuecomment-130426006.

tfmorris commented 8 years ago

@sjg-transparency https://discuss.okfn.org/t/remembering-michael-bauer/1166

abubelinha commented 6 years ago

I had similar issues and finally it was a problem of a malformed csv. Solution: I loaded myt text file with Openrefine, and used the "export as csv", as described here: https://groups.google.com/forum/#!msg/openrefine/jgzicV-Bj9g/HhHu37x1AwAJ

After that, I could use it as a openrefine reconciliation service with reconcile-csv like this:

C:\currentpath> java -Xmx1g -jar C:\path-to-executable\reconcile-csv-0.1.2.jar C:\path-to-csv\myfile.csv "name" "id"