martin-fleck / cra-ttc2016

The Class Responsibility Assignment Case for the Transformation Tool Contest 2016
3 stars 2 forks source link

Runtime of SDMLib solution #14

Open georghinkel opened 8 years ago

georghinkel commented 8 years ago

I am a bit confused with the runtime figures for the SDMLib solution. The times given in the paper are totally different from both the SHARE demo (which in some degree makes sense because of the smaller VMs), but also from the Log.xslx that is contained in the Repo and seems to contain the bare measurements. However, it is absolutely unclear to me how the times from the paper come from this checked-in logfile, because these times are much worse than those in the paper, with the exception of input model D.

I analzed the contents of the Log.xlsx that is in the repo and found the attached results. ResultsSDMLib.xlsx

However, these results somehow do not fit together to the data presented in the paper and neither do the "best" values that I assume should have been the CRAs.

digitalhoax commented 8 years ago

Thanks for having such a detailed look. The machine we ran our tests for the figures in the paper on doesn't have repo access, so the results aren't checked in for obvious reasons. Results on the SHARE image are worse, because we use 14 GB of memory on our test machine to do our calculations. The SHARE runs out of memory quickly and thus yields worse results as the search depth is smaller. Results also vary depending on input parameters.

The results presented in the paper are the best we could achieve with our approach, i.e. on our own hardware, with a search depth between 500 and 2000 and in almost all cases with our DEPTH algorithm running.

In case you are interested I will add the raw data from the test machine to the repo, just let me know.

georghinkel commented 8 years ago

The main problem that I had was that I did not fully understand what the numbers in the paper actually include. So from what I understand now, these numbers correspond to a run that produced the best result, having a certain search depth and algorithm in mind?

The problem here is that as a user, I would not know what parameters to use, which algorithm would be best or what search depth is appropriate. Therefore, I am not sure whether it is appropriate to just pick the best run or whether the time for decision-making (for example by executing the solution multiple times) for these solution-specific parameters should be taken into account.

Furthermore, I am completely puzzled why input model D has such a bad runtime compared to the other input models. Is it because you iterate at least as long as the CRA-index is larger than 0? I mean, several solutions - including mine - failed to see or ignored that there always is a solution that has at least CRA of 0 and that is exactly when all features are contained in a single class.

digitalhoax commented 8 years ago

On pp. 9f you can see results for varying algorithms and search depths. If no specification is given, the tool will just run indefinitely deep with its default algorithm, thus calculating a complete reachability graph, so in most cases, this is what a user wants and expects in our opinion.

For this specific problem the rules are very generic and the reachability graph is huge, so we can not generate a complete reachability graph. Thus we ran our tests with search depths of 100, 200, 500, 1000, 2000. 500 gave similar results to 2000 in many cases, as stated in the paper.

georghinkel commented 8 years ago

Yes, but this leads to the question how meaningful the number is that input model A takes less than a second to complete with the optimal configuration regarding search mode and depth when I need hours to investigate the best configuration in that case. Maybe it would be better to pick one configuration (based on experiments for example) and settle on that, i.e. execute all test cases with this configuration. When you say the chosen configuration was in between depths of 100-2000, mostly with the DEPTH algorithm, I assume this has not been done. Then the question is how meaningful these numbers are.

Was the configuration used for model D significantly different? For me, that would explain the runtime a bit.

digitalhoax commented 8 years ago

I think you mistake the graphs presented in the paper. Included is data for search depths of different sizes (100 - 2000) for ALL exploration algorithms. Case A takes less than a second for all algorithms to process, so no, it does not take you hours to investigate. We could have settled with a search depth of either 500 or 2000 and the DEPTH exploration algorithm, but wanted to include the complete exploration of model A as this is our classic approach for rule application and our use case for reachability graphs. The algorithms to narrow search were included for ttc, as typically a complete reachability graph can easily be generated on any use cases we have come up with yet in our examples. We however thought it would be a nice addition to SDMLib for when one finally emerges.

As you seem to have checked out the complete repository already, you can have a look at https://github.com/fujaba/TTC2016SDMLibSolution/blob/master/paper/Diagramme.xlsx to see raw data. "macmini" is our test machine and you can see results for all search depths and all algorithms there. And you are right, having chose one search depth and algorithm to display all results might have made things a lot clearer.