runtimeverification / rv-predict

Code for improved rv-predict and installer
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

detected race number doesn't match the paper #162

Closed yilongli closed 9 years ago

yilongli commented 10 years ago

According to the PLDI paper, rv-predict detects 5 races in the account example. I can only get 4 with --maxlen 10000. For the mergesort example, the paper claims 9 races, I can only get 5 with --maxlen 100000.

yilongli@ubuntu:~/Eclipse-Workspaces/rv-workspace/rv-predict/examples$ rv-predict --maxlen 100000 -cp basic.jar mergesort.MergeSort

----------------Instrumented execution to record the trace-----------------
Log directory: /tmp/rv-predict6892339673465929167

-------------------------Logging phase completed.--------------------------
Race on field mergesort.MSort.m_iCurrentThreadsAlive between two instances of:
    mergesort.MSort.DecreaseThreadCounter(MSort.java:41)

Race on field mergesort.MSort.m_iCurrentThreadsAlive between:
    mergesort.MSort.DecreaseThreadCounter(MSort.java:41)
    mergesort.MSort.IncreaseThreadCounter(MSort.java:32)

Race on field mergesort.MSort.m_iCurrentThreadsAlive between:
    mergesort.MSort.AvailableThreadsState(MSort.java:48)
    mergesort.MSort.DecreaseThreadCounter(MSort.java:41)

Race on field mergesort.MSort.m_iCurrentThreadsAlive between:
    mergesort.MSort.AvailableThreadsState(MSort.java:48)
    mergesort.MSort.IncreaseThreadCounter(MSort.java:32)

Race on field mergesort.MSort.m_iCurrentThreadsAlive between two instances of:
    mergesort.MSort.IncreaseThreadCounter(MSort.java:32)

I haven't tried the other examples, but I would expect the same issue more or less. How could this happen?

traiansf commented 10 years ago

I guess this could be due to the fact we don't use the branch model by default, while the PLDI paper was all about improvements by taking into account branching events.

@jeffhuanguiuc can you confirm?

grosu commented 10 years ago

As soon as Yilong refactors a bit RV-PRedict and improves its performance, I think we should start looking seriously into how to efficiently achieve the maximal model for which we are getting a US patent, which is maximla model with branches. It should be as easy to enable it as having an option --branches or -b. I mean, to maximize our impact, we should: 1) make it clear to our users that we are better than the best prediction tools even with the default settings 2) that -b comes at a performance price, but that it gives them the ABSOLUTE maximum prediction, that no other tool can ever exceed. Can we make this selling pitch real? :)

Grigore


From: Traian Florin Șerbănuță [notifications@github.com] Sent: Monday, November 03, 2014 11:56 PM To: runtimeverification/rv-predict Subject: Re: [rv-predict] detected race number doesn't match the paper (#162)

I guess this could be due to the fact we don't use the branch model by default, while the PLDI paper was all about improvements by taking into account branching events.

@jeffhuanguiuchttps://github.com/jeffhuanguiuc can you confirm?

— Reply to this email directly or view it on GitHubhttps://github.com/runtimeverification/rv-predict/issues/162#issuecomment-61596938.

parasol-aser commented 10 years ago

Another reason might be we've fixed some bugs or introduced some new bugs.. any non-determinism in the trace might also be a cause.

yilongli commented 10 years ago

Tonight, I tried the airlinetickets example and rv-predict could only find 6 races, which is even less than CP or HP approach according to the paper. And I couldn't make branch model work.

Honestly, I am very frustrated when working on rv-predict recently. How could we never have a proper regression test suite and mechanism to make sure we do not lose detected races over time? Now I am working on a simple solution to fix that.

@jeffhuanguiuc Could you send me the original data of detected races used for the PLDI paper? I would like to recover all of them and add them to the regression test to make sure rv-predict catches all of them in future development.

yilongli commented 10 years ago

@grosu Unfortunately, we can not "make it clear to our users that we are better than the best prediction tools even with the default settings" at the moment. Because our default setting detects less races than CP/HP based approach in several of our small examples comparing with the number reported in the paper.

The airlinetickets example is just one of them.

grosu commented 10 years ago

From what I remember, the PLDI paper was penalized by the artifact evaluation committee for similar reasons; the reviewers were not able to reproduce our numbers claimed in the paper. I am sure they were correct, probably what happened was that Jeff and Traian tried to improve things along a different dimension (speed?) at the expense of another (coverage?). But yes, I fully agree: tests, tests, tests, please!

Grigore


From: Yilong Li [notifications@github.com] Sent: Wednesday, November 05, 2014 7:48 PM To: runtimeverification/rv-predict Cc: Rosu, Grigore Subject: Re: [rv-predict] detected race number doesn't match the paper (#162)

Tonight, I tried the airlinetickets example and rv-predict could only find 6 races, which is even less than CP or HP approach according to the paper. And I couldn't make branch model work.

Honestly, I am very frustrated when working on rv-predict recently. How could we never have a proper regression test suite and mechanism to make sure we do not lost detected races over time? Now I am working on a simple solution to fix that.

@jeffhuanguiuchttps://github.com/jeffhuanguiuc Could you send me the original data of detected races used for the PLDI paper? I would like to recover all of them and add them to the regression test to make sure rv-predict catches all of them in future development.

— Reply to this email directly or view it on GitHubhttps://github.com/runtimeverification/rv-predict/issues/162#issuecomment-61915230.

yilongli commented 9 years ago

The numbers reported in the paper do not matter anymore because the new implementation fixed so many bugs that could lead to missing bugs and/or false alarms.