nkrusch commented 2 years ago

Please review this changeset and run it on examples to evaluate its impact.

The evaluation procedure is explained with examples in choice.py so I will not repeat it here, but we can discuss it in more.

This approach also changes the representation of choices. Before:

[2,2,2,2,0,0,2], [2,2,2,2,0,1,2], [2,2,2,2,0,2,2], [2,2,2,2,1,0,2], [2,2,2,2,1,1,2], 
[2,2,2,2,1,2,2], [2,2,2,2,2,0,2], [2,2,2,2,2,1,2], [2,2,2,2,2,2,2]

Equivalent, but now more compact:

[[[2], [2], [2], [2], [0, 1, 2], [0, 1, 2], [2]]]

Which is read as: choose a vector (there could be multiple, it is a list of lists), then choose one of the remaining choices at each index; this will give a valid derivation.

This change has some practical implications:

faster eval+compact representation of choices enables analyzing programs with larger index (more assignments) and also reasonably displaying or writing those results to file
it enables evaluating programs like long.c, whose index is 13, which is also introduced in this changeset
it is now possible to evaluate the original dense.c (in tests/test_examples/complex_dense.c) -- it still takes ~1 min to analyze, but the evaluation step is instantaneous
the analysis phase is now more significant regarding performance
removes progress bar external dependency

codecov-commenter commented 2 years ago

Codecov Report

Merging #78 (d87c1f9) into main (6b3efe1) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #78      +/-   ##
==========================================
+ Coverage   88.52%   88.53%   +0.01%     
==========================================
  Files          13       14       +1     
  Lines         915      977      +62     
==========================================
+ Hits          810      865      +55     
- Misses        105      112       +7

Impacted Files	Coverage Δ
pymwp/__init__.py	`100.00% <100.00%> (ø)`
pymwp/analysis.py	`86.03% <100.00%> (ø)`
pymwp/choice.py	`100.00% <100.00%> (ø)`
pymwp/file_io.py	`84.00% <100.00%> (+0.32%)`	:arrow_up:
pymwp/monomial.py	`98.41% <100.00%> (-0.12%)`	:arrow_down:
pymwp/polynomial.py	`95.85% <100.00%> (+0.48%)`	:arrow_up:
pymwp/relation.py	`98.00% <100.00%> (+0.60%)`	:arrow_up:
pymwp/delta_graphs.py	`88.34% <0.00%> (-8.74%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6b3efe1...d87c1f9. Read the comment docs.

nkrusch commented 2 years ago

I linked issue #52 here because if this PR is merged, the issue is no longer relevant.

aubertc commented 2 years ago

I linked issue #52 here because if this PR is merged, the issue is no longer relevant.

Absolutely.

aubertc commented 2 years ago

Not sure where is the best place to store this example we started working on, so I'm just posting it here, @nkrusch , for the record. Implementing something of that sort or using it to explain the algorithm could be useful.

example_eval

nkrusch commented 2 years ago

Thanks for this picture. I was "just" going to go over it quickly and document it but then I realized I had crossed out the third row, so it should give 4 results (not an issue for correctness, just overly verbose) :

[[2],[0],[0,1,2]],
[[2],[0,1],[0,2]],
[[2],[0,2],[0,2]],
[[2],[0,1,2],[0,2]]

And what I actually want is:

[[2],[0],[0,1,2]],
[[2],[0,1,2],[0,2]]

So, clearly need to improve the simplification steps. It has to do with the duplicated delta (1,2) in the two longer sequences, but I am having a bit of hard time trying to figure out how to handle it in general case.

Working in reverse[^note]:

(0,0)   (0,1)   ((2,1) (1,2))   ((2,0)(1,1)(1,2))  ----(?)---->    (0,0)   (0,1)   (???)   (???)

Then combining element-wise in all possible ways, it should have only the following two remaining, since this is minimal form:

(0,0) (1,0) (2,1) (1,1) => [[2],[0],[0,1,2]] 
(0,0) (1,0) (1,2) (1,2) => [[2],[0,1,2],[0,2]]   # same as writing (0,0) (1,0) (1,2)

How can I determine these two are redundant:

(0,0) (1,0) (2,1) (1,2) => [[2],[0,1],[0,2]] 
(0,0) (1,0) (1,2) (1,1) => [[2],[0,2],[0,2]]

In both cases I am pairing a delta from one sequence with the duplicated (1,2) pair. So, since (0,0) (1,0) (1,2) by itself is bad, we do not need (0,0) (1,0) (1,2) (1,1) and (0,0) (1,0) (2,1) (1,2).

It may also be the case that perhaps the simplification should be done after generating the element-wise combinations; not at the point in time above where I wrote (?) earlier. When combining with some duplicated delta, it will generate n-1 distinct deltas, then any other combination that contains the n-1 sequence is automatically bad. I can also see this could apply in a case where some delta is duplicated across more than 2 sequences. Perhaps I should make all combinations and remove the redundant options at that point?

[^note]: Also: anything with (2,0) is automatically bad, since we also have (0,0) and (1,0) and could ignored before generating all combinations, so the deltas can be simplified to: (0,0) (0,1) ((2,1)(1,2)) ((1,1)(1,2)); another improvement. I check for it now, but only after generating combinations, which means ~some small number of unnecessary iterations.

statycc / pymwp

changes to eval and choice representation #78

Codecov Report