rufuspollock-okfn / reconcile-csv

A simple OpenRefine reconciliation service that runs on top of a CSV file
BSD 2-Clause "Simplified" License
117 stars 27 forks source link

NullPointerException #28

Closed Yzy-Joey closed 7 years ago

Yzy-Joey commented 7 years ago

Hi,

I have zero background in CS and am having a lot of trouble running this program.

To test it, I created two csv file (values separated by comma).

Sample_1.csv with two columns: (A1-A5: Name A B C D) (B1-B5: ID 1 2 3 4)

Sample_2.csv with two columns: (A1-A5: Nname A B C D) (B1-B5: Number 11 22 33 44)

What I did was download reconcile-csv and open Windows Powershell, type in the command: sample_2

Then I upload Sample_.csv1 into OpenRefine, based on column Name, started reconciling with localhost:8000/reconcile.

This is the code I received. The program just stops there (still running, but not doing anything. Openrefine window shows it is "Working". Does anyone know where does the NullPointerException point to and how can I make this work? code_1 code_2

Much Thanks!

Joey

Chaoticum commented 7 years ago

Hi Joey,

there are few mistakes in the command.

1) Better is to use absolute path for the reconcile csv jar file. 2) Don't use folders with spaces in the folder name. It's much better to remane folders like this Research-with-Lincoln and Open-Refine-Project. 3) You are trying to use xlsx file (Sample_1.xlsx) but you must save the file properly and use csv (Sample_1.csv), this service cannot process xlsx. CSV must be separated by commas. 4) Again better to use absolute path for the csv file. And second column has compulsory name "ID". 5) And just to be sure I delimit command with double quotes.

I use standard Windows command line and my code looks like shown below. And it works the same in PowerShell.

java -Xmx2g -jar "C:\Users\Chaoticum\Downloads\reconcile-csv-0.1.2.jar" "C:\Users\Chaoticum\Downloads\origX.csv" "Address" "ID"

This how it looks like when you succesfully start the service:

succesful-ps-reconcil-server-start

I hope this will help you. :)

Zdenek

Yzy-Joey commented 7 years ago

Hi Zdenek,

Thank you for your response!

However, I tried your ways and the program is still pointing to the NullPointerException Error. Please let me know which part I did wrong.

It was the same file and I changed the folder name into ones with no spaces and delimited with quotes. Here is what I have. start the code

However, after running the reconcile function, here are the errors I receive. error_1 error_2 error_3 error_4

Thank you very much for helping me out!

Best, Joey

Chaoticum commented 7 years ago

Hi Joey,

OK, we've eliminated most of the basic possible problem sources.

How about your CSV file? Are you sure it's delimited by commas? This service cannot work with files delimited by semicolons or some custom delimiters, only commas. If you are generating your CSV with Excel then it might be a problem since Excels uses semicolons as default delimiter. And also text string should be delimited by quotes (only necessary for text strings).

I'm using Libre Office for saving my CSV files since it's quite a pain to set it up properly in Excel.

This is how your file should look like when you open it in Notepad. Text strings delimited by " and values delimited by ,:

"Name","ID" "text string",1 "another text string",2

Another trigger for this long error message is when you use wrong column names in command. Column name in CSV is "Name" but if you type for example "Nom" as the third parameter in the command, then you'll get that annoying error. Column name in CSV and command must be same. And last parameter must always be called ID in CSV and command.

Hopefully this will help you. If not, try to upload your CSV and I'll check it out. ;)

Zdenek

P.S.: I know if quite troublesome process. I've spend quite a few hours figuring out how it works. But it's worth it. This tool saved me a lot of time of boring manual work in Excel.

Yzy-Joey commented 7 years ago

Hi Zdenek,

Thank you so much for your help! I guess the reason I was having the error was because of my csv format. I created a new sample dataset based on your description and it worked!

Now, maybe one last question: Once I finish reconciling, is there anyway for me to keep the matching score in a new column? Suppose a cell has name "South California". It matches with another cell in the other file named "SoCal" with the matching score of 65%. Can I generate a new column next to South California and keep the score 65%?

If not, is it possible for me to pick values based on their matching scores? For instance, only keep values with matching scores above 95%?

Again, thank you very much for helping me out through the whole process. Really helpful!

Best, Joey

Yzy-Joey commented 7 years ago

Problem Solved! Ignore my previous post. Thank you very much Zdenek!

Chaoticum commented 7 years ago

You're welcome, Joey. I'm glad I could help. :)