interedition / collatex

CollateX – Software for Collating Textual Sources
http://collatex.net/
GNU General Public License v3.0
88 stars 36 forks source link

Reading txt in the command line interface #35

Open silviaegt opened 8 years ago

silviaegt commented 8 years ago

Dear Collatex creators, thank you so much for making your tool available!! It sounds super useful. Sadly, I haven't been able to run it. It would be lovely if you can show me an example of how to read a .txt file using the Command Line Interface. Say I have output-adobe.txt + output-tesseract.txt + original.txt and want to compare them. I open collatex like: C:\Users\xxxx\Desktop> java -jar collatex-tools-1.7.1.jar and then?

rhdekker commented 8 years ago

Dear Silvia,

Typing: C:\Users\xxxxx\Desktop> java -jar collatex-tools-1.7.1.jar output-adobe.txt output-tesseract.txt original.txt should produce an alignment table in JSON format.

If you don't like the JSON format there are other formats, for example comma separated values (CSV).

C:\Users\xxxxx\Desktop> java -jar collatex-tools-1.7.1.jar output-adobe.txt output-tesseract.txt original.txt -f csv

Hope this helps.

Best, Ronald

silviaegt commented 8 years ago

Dear Ronald, it worked perfectly, thank you!

Although I was not able to get the encoding right :( In the documentation I found that:

plain text version can also be provided in other encodings supported by the Java Platform and will be converted to Unicode before comparison. The command line interface is one such interface which supports character set conversions

I tried doing this: C:\Users\xxxxx\Desktop>java -jar collatex-tools-1.7.1.jar output_abby.txt output_tesseract.txt output_clean.txt -f csv -ie utf-8 -oe utf-8 >> output.csv

But it didn't work.

Thank you in advance for any help you can provide!

Cheers,

S.