opencog / relex

English Dependency Relationship Extractor
http://wiki.opencog.org/w/RelEx
Apache License 2.0
85 stars 69 forks source link

Move test fixtures into TSV plain text files #103

Open ceefour opened 10 years ago

ceefour commented 10 years ago

TestRelEx contain sentences and expected results inside the Java tests, which are then iterated.

It'd be more convenient to put these fixtures a la FitNesse into a spreadsheet with 3 sheets (Comparatives, Extraposition, Conjunction), which can be edited very conveniently in LibreOffice Calc, which can allow editing hundreds of tests and not too painful. :)

The fixtures are then loaded into JUnit tests using odftoolkit.

Depends on #98.

If accepted you can assign to me.

linas commented 10 years ago

If you wish to create patches that do this, that's OK. I don't think they should be used for unit testing, for multiple reasons:

1) People already find relex too difficult to configure and install. Adding yet another dependency would make that aspect worse.

2) Spreadsheets and large, complex systems like LibreOffice are .. I dunno .. hard to use, hard to understand, ... I'm not sure what they do. They have something to do with business intelligence ... large corporations use them. Relex doesn't have any business people using it, and I don't see what the point of integration with business systems would be. Will business people start using relex? Why?

ceefour commented 10 years ago
  1. I've configured, installed from source, and tested RelEx. I know why it's difficult to build: many of the dependencies are managed manually. I did my job to Mavenize RelEx. It's sooo much easier now to build it.
  2. No LibreOffice is needed to use, build, or test RelEx. It's only used for editing the fixtures. I'll send you a spreadsheet or screenshot so you can know how it looks like.
linas commented 10 years ago

If you wish to provide patches, that's OK; its unlikely I would turn them down, as long as they don't create additional dependencies for the user.

Myself, I have no plans to install or learn how to use libreOffice or to try to remember how to use spreadsheets .. again, I think spreadsheets are way beyond the level of complexity that most people would know how to use, and I just don't think business people are going to be flocking to Relex because of them. These are very different worlds.

ceefour commented 10 years ago

This is how it would look like: (it's from another project of mine)

yago-rules ods - libreoffice calc_397

Basically it's just a table of text, no fancy "spreadsheet features" if that's your concern.

It's easier and faster to change, than editing test cases in code: (i.e. separates the test code from test _data)

selection_398

Note that putting test data inside Java code, the test data takes screen space, and to edit them one needs to escape "\n" and use string operators.

linas commented 10 years ago

I can't even begin how to imagine how it can possibly be easier to edit a spreadsheet, than to edit the code. That like saying juggling three balls while standing on your head is "easier" than walking. That's just .. crazy.

Look I'm spending a lot of time talking to you but you keep offering these wild ideas and I just don't see how they are useful in any way. Spreadsheets are just big complicated tables and are pretty much useless. There's no value-add here.

For software to be useful, it has to actually do something. If it doesn't actually do anything, then its game, and I'm just not interested in games; I'm interested in learning and language processing, not business software.

ceefour commented 10 years ago

It's easier to edit data in a spreadsheet the same way it's easier to edit code in an IDE. A spreadsheet is can be small or big depending on data, just like a .java or .sch file can have 100 bytes or 100 KiB. The Java test has 876 lines and the actual Java code is probably about 100 lines, the rest 700-something lines are data wrapped in Java code. If these are put in table it's easier and more compact.

I'll prove it to you, here's very short video of me typing RelEx test data in a sheet: http://youtu.be/Xg1hXZdT6MU

It's very easy. I just type or edit or correct, Tab, and Enter. I don't need to escape \n or worry about string delimiters. I don't have to type \" if a " comes along.

I can switch between data sets easily just by changing tabs, in the Java editor I have to scroll or use the Outline to find the method which holds the data.

When typing this I have editor side-by-side. The editor shows me 4 test data in Java code. The spreadsheet displays probably 30 test data. Using the same visual space. If I maximize the window then it can probably display 60 test rows at once. RelEx has 80 tests now. One can very quickly see what the tests are and append more tests.

I can make another video if you want, showing me typing the same test data inside a Java editor. It's definitely longer than this video. :)

linas commented 10 years ago

Its not easier to edit anything in a spreadsheet. I don't have access to any spreadsheets or spreadsheet editors. The last time I used one was 20 years ago. I don't have any use for business software and I don't have time to watch a youtube video. This entire conversation is crazy and pointless and a waste of time. I'm done.

ceefour commented 10 years ago

I'm sorry to take your. It's really not my intention. My intention is to show you that there is a better, faster, easier way to do some of things and I tried to explain in writing and also to make a video demonstrating it. I'm aware that you don't like spreadsheet application, but it doesn't mean it's not a good fit for this purpose.

Anyway, I have an alternative which I hope you might like better. Would you mind moving the test data into TSV (tab-separated) format? It's purely text so I hope this is acceptable to you. This is how it looks like in a plain text editor, and I believe this is also easy for you and everyone to edit.

-home-ceefour-tmp-relex-test2 tsv kate_403

If you accept you can assign to me.

githart commented 10 years ago

LibreOffice Sheets with odftoolkit may be the better and more convenient option; the TSV in this case is ugly and cumbersome. Although there may be a case for keeping a diff-friendly format, I do not know what this case may be.

And really it's just a way to edit test data, not a build dependency. I think the issue should be back open for discussion.

ceefour commented 10 years ago

While I personally prefer editing with LibreOffice (it's preinstalled in most distros including Ubuntu, Linux Mint, Fedora, and straight download for Windows/OSX), I can understand Dr. Vepstas' objections and to me, editing tests in TSV is still so much better than editing them in Java code:

-home-ceefour-tmp-relex-test2 tsv kate_403

the above contains exactly the same data as below. The Java code also uses indentation, so it's just like TSV, with the addition that we need boilerplate code, string quotes, + and \n and ); etc. that's not necessary in the TSV.

selection_407

If spreadsheet usage is approved, I'd probably use it this way:

relex2-spreadsheet ods - libreoffice calc_409

I'd color code green as "passing", red as "failed", yellow as "some subtests fail", and orange as "although this test passes, actually the logic is hardcoded and hacky, so please revisit it". (the coloring is up to debate, feel free to use colors easier on your eyes. If you don't like the colors then okay, no need to use them.)

Personally I feel it reduces cognitive overload since my brain doesn't need to process extraneous boilerplate stuff like rc &= test_sentence (, while at the same time providing visual cue "oh this one is broken, that one passes" and spatial information (ok, I've got ~50% coverage here, not because it says 50% but just by looking than half of the screen is green colored.)

Again it's up to you since this is your project, all I do is suggesting improvements and explaining its benefits, while I also acknowledge your concerns are valid. I'm already more than happy if TSV is accepted.

BTW if you're concerned about diff, LibreOffice can save FlatXML format if so desired, basically an uncompressed ODS file (since LibreOffice documents are technically zipped XML). I don't see why anyone would want to diff it though (it's test data, not code) so I'd suggest use the regular ODS format.

bgoertzel commented 10 years ago

Hmmm...

Whether use of a spreadsheet makes sense here or not is a matter of taste, I guess.... Of course, LibreOffice is free software and very easy to use. But I can understand not wanting to use additional software to view the test cases.

I do see some sense in putting the test sentences in text files of some sort, rather than in the code, though. Wrapping data in java code does seem a bit cumbersome IMO.

There are other things in OpenCog in more need of attention than these test cases. But I can appreciate that Hendy, as a new contributor, is looking at low-hanging fruit...

-- Ben

On Thu, Jul 10, 2014 at 12:32 PM, Hendy Irawan notifications@github.com wrote:

While I personally prefer editing with LibreOffice, I can understand Dr. Vepstas' objections and to me, editing tests in TSV is still so much better than editing them in Java code:

[image: -home-ceefour-tmp-relex-test2 tsv kate_403] https://cloud.githubusercontent.com/assets/24123/3530189/1bfe0614-07a2-11e4-8abf-502cf3fd00f6.png

the above contains exactly the same data as below. The Java code also uses indentation, so it's just like TSV, with the addition that we need boilerplate code, string quotes, + and \n and ); etc. that's not necessary in the TSV.

[image: selection_407] https://cloud.githubusercontent.com/assets/24123/3534436/1daee8f6-07e9-11e4-9b1a-ae55d1fa3dd6.png

If spreadsheet usage is approved, I'd probably use it this way:

[image: relex2-spreadsheet ods - libreoffice calc_408] https://cloud.githubusercontent.com/assets/24123/3534454/b9d39d6c-07e9-11e4-83aa-1e80e0579373.png

I'd color code green as "passing", red as "failed", yellow as "some subtests fail", and orange as "although this test passes, actually the logic is hardcoded and hacky, so please revisit it". (the coloring is up to debate, feel free to use colors easier on your eyes. If you don't like the colors then okay, no need to use them.)

Personally I feel it reduces cognitive overload since my brain doesn't need to process extraneous boilerplate stuff like rc &= test_sentence (, while at the same time providing visual cue "oh this one is broken, that one passes" and spatial information (ok, I've got ~50% coverage here, not because it says 75% but just by looking than 1/2 of the screen is green colored.)

Again it's up to you since this is your project, all I do is suggesting improvements and explaining its benefits, while I also acknowledge your concerns are valid. I'm already more than happy if TSV is accepted.

BTW if you're concerned about diff, LibreOffice can save FlatXML format if so desired, basically an uncompressed ODS file (since LibreOffice documents are technically zipped XML). I don't see why anyone would want to diff it though (it's test data, not code) so I'd suggest use the regular ODS format.

— Reply to this email directly or view it on GitHub https://github.com/opencog/relex/issues/103#issuecomment-48564313.

Ben Goertzel, PhD http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our minds" -- Robert Nesta Marley

amebel commented 10 years ago

@ceefour Separating the code from the test-data-set had been discussed before here . Thus it would be great if you could follow on that.

With regards to the .ods file, i think it is better if it is inside a txt file (you could name it *.test for clarity); then any one using an editor like vim can easily work with it.

Thanks :-)

ceefour commented 10 years ago

@AmeBel sure, I'd implement it like I suggested in https://github.com/opencog/relex/issues/103#issuecomment-48525228 . Thanks :)

Another benefit of separating test cases is that if one day RelEx is ported to another architecture, these test cases can be reused as-is. Or used concurrently by both project variants.