pull-stream / pull-stream

minimal streams
https://pull-stream.github.io/
MIT License
796 stars 78 forks source link

pull.take aborts multiple times upstream when aborted before a value is returned #104

Open elavoie opened 6 years ago

elavoie commented 6 years ago

Setup: SRC -> TI (TAKE) TO -> SINK. Take arguments:

test === 1
opts === undefined

SINK <-> TO:

{ port: 'SINK', type: 'request', request: 'ask', i: 1, cb: true }
{ port: 'TO', type: 'answer', answer: 'value', i: 1, v: 1 }
{ port: 'SINK', type: 'request', request: 'ask', i: 2, cb: true }
{ port: 'SINK', type: 'request', request: 'abort', i: 3, cb: true }
{ port: 'TO', type: 'answer', answer: 'done', i: 2 }
{ port: 'TO', type: 'answer', answer: 'done', i: 3 }

TI <-> SRC:

{ port: 'TI', type: 'request', request: 'ask', i: 1, cb: true }
{ port: 'SRC', type: 'answer', answer: 'value', i: 1, v: 1 }
{ port: 'TI', type: 'request', request: 'abort', i: 2, cb: true }
{ port: 'TI', type: 'request', request: 'abort', i: 3, cb: true }
{ port: 'SRC', type: 'answer', answer: 'done', i: 2 }
{ port: 'SRC', type: 'answer', answer: 'done', i: 3 }

Invariant violated:

Invariant 5 violated: request made after the stream was aborted

Basically, I think the second abort from TI should not happen and the expected sequence of events should be:

{ port: 'TI', type: 'request', request: 'ask', i: 1, cb: true }
{ port: 'SRC', type: 'answer', answer: 'value', i: 1, v: 1 }
{ port: 'TI', type: 'request', request: 'abort', i: 2, cb: true }
{ port: 'SRC', type: 'answer', answer: 'done', i: 2 }
dominictarr commented 6 years ago

I tried to reproduce this but I couldn't get it to work, do you have some code we can make into a test case?

my attempt: https://github.com/pull-stream/pull-stream/commit/6a5e7d53c6855086173446249abdc5b7d152c9ce

elavoie commented 6 years ago

Added test in 'take-abort' branch with commit: 0bc49cadda91eb224093b4f2ba74d42b66aba009.

The thing is, in general can a module abort multiple times? If so, is there an upper bound on the number of aborts? And must sources support multiple aborts?

dominictarr commented 6 years ago

no, there should be only 0 or 1 aborts per stream. There should not be an abort after an end message. After starting to read your paper I realized that the output you post here was from your event language - I didn't realize that at first - you didn't explain the notation you are using here in this issue

elavoie commented 6 years ago

Right, I should have provided explanations about the event sequences and notation. I have been working on it for 3 months now, I had forgotten that it might not be obvious to anyone else beside me...

In the case above though, the two aborts come before the 'end' message of the first one. The Abort section of the spec document (https://github.com/pull-stream/pull-stream/blob/take-abort/docs/spec.md#abort) does not mention whether a module must not abort more than once, regardless of when it receives the answers. The constraint mentioned in the End section only specifies that The read method must not be called after it has terminated..

dominictarr commented 6 years ago

good point - but yeah, only 1 abort allowed.

dominictarr commented 6 years ago

I guess it's implicit, what would be the purpose of doing two aborts? can't really make it abort faster or abort the abort.

elavoie commented 6 years ago

From the protocol point of view it is redundant. However, maybe implementations would be a little more complicated to avoid multiple aborts? We will see whether that is the case with the improved take implementation.

But that opens a whole can of worms: if there may always be another request in the future, you never know when something is really terminated and will stop generating events.

dominictarr commented 6 years ago

Oh, now that I think about it, quite often I used a pattern where the read function checks if a variable ended has been set, and just calls cb(ended) if it has. If the reader stream was implemented correctly, this will only happen once - but in the chance that it reads back again, it will just continue to send an end signal. you could interpret this as "once and end signal has been sent, only the exact same end signal is sent on subsequent reads"

There are other streams, like the simplest map:

function Map (fn) {
  return function (read) {
    return function (abort, cb) {
      read(abort, function (err, data) { cb(err, err ? null : fn(data)) })
    }
  }
}

that doesn't enforce any behavior at all. if you abort it multiple times it will just pass those through. you could add run time checks for whether the stream has been aborted, etc, but in the case were everything is correctly implemented it would be pure overhead.

I did implement a library that could check if you pull-stream behaved correctly: https://github.com/dominictarr/pull-spec

I'd say I wrote this when I needed to debug something one time... but I since have not used it much.

dominictarr commented 6 years ago

oh yeah, an example of using the if(ended) cb(ended) pattern in pull-box-stream

and pull-group and pull-stringify and I'm sure others too.

elavoie commented 6 years ago

I don't think each individual module should defensively manage non-compliant behaviour so Map's behaviour is fine. We should check for compliance with exhaustive testing, rather than defensively abort incorrect behaviour. That way we only make sure individual modules follow the protocol when interacting with correct modules. That way we do not pay for it during the module execution and the implementation of modules can stay as simple as possible.

Moreover, it is ok to send multiple 'ended' answers (what I call 'done' answer). At least two are necessary when aborting early before receiving an answer: one for the ask request (read(false,x1)) and one for the abort request (read(true,x2)). So in general having multiple done answers is allowed. It is just that no new request should be done after a done answer has been received.

But the take example before is different. The sink aborts only once early and correctly follow the protocol but the take module generates two aborts. That is inconsistent with the protocol definition we have agreed on.

I did see your library, I reimplemented it to make sure I fully understood and have a consistent implementation with the module specifications I put in the paper.

elavoie commented 6 years ago

However, having only one abort is important because we need to know at which point the sink will stop generating events forever after. That way when the abort request has received its done answer, we know for sure no other event will happen.

In the previous versions of my specification otherwise I had to provision the case for an unbounded number of abort requests and that made everything unnecessarily complicated to reason about and explain.

elavoie commented 6 years ago

Let me know if I successfully captured the protocol and explained it clearly, precisely, and simply in the paper: https://www.overleaf.com/read/mmqffvhmwrpd

I think we need a clear terminology to talk about pull-stream. I have found last year that I could intuitively implement modules (mostly) correctly just after reading pull-stream-by-example and informal reasoning but I was at lost trying to explain what pull-lend-stream was doing in terms of concrete behaviour and could not definitely convince myself it was correct in all possible cases. I was only left with the option of referring people to the source code and hoping I did not miss a case. The paper is my attempt at doing better than that.

I put another version on Arxiv for citation and distribution purposes today, so that even if the paper is rejected from ECOOP18, people can still cite and refer to it. This will also allow us to improve the paper if we find anything wrong or incomplete in the future. I should have an url tomorrow.

More generally, we should decide on a place in the existing SSB documentation to link to papers. These can give academic credibility to the project. It could facilitate the work of other academics and may help getting future European grants to further its development.

dominictarr commented 6 years ago

When do you find out whether you are accepted into ECOOP18? (I am very excited to share the fact that there is a real paper written about pull-streams, but want to wait till we know it's accepted)

elavoie commented 6 years ago

@dominictarr Just got the Arxiv permanent url! You can cite this one instead: https://arxiv.org/abs/1801.06144. Although it is not peer-reviewed and therefore there is no committee that vouched for its quality, you do need minimum research credentials to publish to Arxiv. We can review it ourselves actually, no need for a conference committee. If it helps us think clearer and get the implementations right faster and easier than we will know it is a good paper. Otherwise, I still have work to do on it!

You will have to wait until March for a preliminary response and April for the definitive decision (https://2018.ecoop.org/). Don't keep your hopes too high though, my other paper on Pando, which uses pull-streams has been rejected twice in the last year from OOPSLA and ASPLOS. It might take some more submissions until both get in somewhere and I will have to stupidly copy edit them to arbitrary standards and page limits for every resubmission and hope a different review committee thinks differently than the previous one...

rant/ Frankly, I will be much more satisfied if the pull-stream paper is actually being used in the community by multiple projects than getting it into a "prestigious" conference. The way the system works, prestigious means low-acceptance rate (15-20%) which also means by definition high-rejection rate (80-85%). I think that it is a really stupid metric to assess the quality of papers. I personally think that we should instead have researchers "vouch" for papers they have reviewed instead of comparing them to the batch of papers it was part of at the time of submission to a particular conference. Having recognized and/or a high number of researchers vouch for your paper would mean both that it was interesting to them and at least met their personal standards. There would be no artificial scarcity introduced by the small number of high-profile conferences with low-acceptance rates.

Moreover, someone could say: "it is really interesting and is almost there, just fix these and do additional experiments on that to confirm that thing and I would totally vouch for it!". Then you fix it, come back, get vouched and move on, rather than spamming conferences until it gets somewhere... /rant

dominictarr commented 6 years ago

Oh yeah, I agree! vouching sounds a lot better to me! (and a lot more secure-scuttlebutt!) the prestigious exclusive conference I guess is the way you could implement when the main tool was printing presses though. I guess I want to show off to software engineers who maybe don't know the whole academic process but will seem impressed by it. With a formal specificiation pull-stream are now real computer science so we are moving into the territory where you don't get fired for using pull-streams.

elavoie commented 6 years ago

@dominictarr I don't think it has anything to do with the printing press. The current focus on conference and journal publications is maybe 20-30 years old. Previously, the main mean of publication for a PhD was the dissertation organized around a thesis. Among other characteristics it:

  1. has no arbitrary page limit;
  2. has a review committee that is already somewhat already interested in the work;
  3. does not establish prestige by rejecting 85% of submissions but rather tries to establish whether the research meets the standards of its research community;
  4. is paid by the general public through their taxes by funding the reviewers as profs;
  5. is open access forever with no extra fees;
  6. does not change between improvements of the work;
  7. can dedicate more than 1 hour to understanding it...

Nowadays, to defend your PhD thesis, you are expected to get 3 publications in conferences as prestigious as possible before writing your thesis! You may even now write your thesis by putting more or less verbatim your 3 articles in the same document and add a little bit of text to thread that in a coherent story. But getting these three publications is not completely in your control, is highly competitive, the review committees are anonymous and constantly changing if you get rejected. Practically speaking, your School department has lost control over who are recognized as good researchers and the power went to conferences and journals.

I believe the actual reason for this arrangement is a little more sinister than convenience: it is an attempt at putting academics in stronger competition between one another both for publication and funding and orient the direction of research of an entire community by orienting the direction of research of a few key conferences. You don't need to tell them what they should do, you just need to set the agenda and research topics for a few prestigious conferences and academics will follow suite because their promotions, tenure, and funding depends on the prestige of the conferences at which they are publishing. See http://disciplinedminds.tripod.com/ by Jeff Schmidt for how that plays out in Physics. In Computer Science you can see how Peer-to-Peer mostly died as an academic subject 15 years ago around the time Kademlia was proposed and got replaced by Cloud Computing (privately owned data centers) and more recently Edge Computing (centrally-managed applications with as much of the cost of operations pushed to the clients). The P2P researchers of the 90s and beginning of 2000s now work for Google and Microsoft Research and have stopped doing anything meaningful. It is the open source developer communities (especially the JavaScript one) that are keeping that alive completely outside of academia. That is why I am trying to contribute here!

I have a different vision of academia: let academics have a social life embedded in the society around them, trust them to use their scholar training and knowledge to make the world better for the people around them, and have like-minded people from an international community review each other's work on the topics that interest them. That would be real academic freedom instead of the pseudo-freedom of choosing the projects that align with major conferences' topics. But that is also an anarchist vision: you don't control academics in such a setting! They might criticize the power in place and propose radical alternatives, so it is a lot less comfortable!

elavoie commented 6 years ago

@dominictarr Who got fired for using pull-streams?

dominictarr commented 6 years ago

haha, no one that I know of, but it's a reference to: https://www.quora.com/What-does-the-phrase-Nobody-ever-got-fired-for-choosing-IBM-mean

wow, okay, interesting stuff!

I nearly went into academia - I did honors, masters would have been next... but I ended up with Mad Science, and here I am.

elavoie commented 6 years ago

@dominictarr You made the right choice! The stuff I am reading about pull-streams and SSB is way more interesting than everything I saw presented in conferences in the last 8 years and more importantly, it is directly useful to communities of people right now. You have shown me that some inventions are easier to do outside academia than inside. Keep going like that, I got your back if we ever need official academic validation.

av8ta commented 6 years ago

You have shown me that some inventions are easier to do outside academia than inside.

I'd say most inventions. Apart from inventions requiring large amounts of capital. However, future crowdfunding methods may make that problem tractable too.

Good luck with your conferences, and I'm shocked the dissertation process has been dropped - I had no idea that happened and agree it seems a very retrograde step for academia.

elavoie commented 6 years ago

@av8ta the dissertation process which culminates in a PhD Defense still exists as a rite of passage (I am writing a dissertation right now!) but in my view, it is now mostly an internal validation of work that has been published externally to the department. You are no longer required to build a 200-page coherent argument, you can thread different publications in a more-or-less cogent story and that is sufficient.

My advisor puts a lot of pressure on getting prestigious conference publications out of my work even though objectively the same information would be available in a dissertation. It has nothing to do with dissemination of information, it is about prestige as a proxy for a judgement of quality, which influences promotions and grants.