jmoenig / Snap

a visual programming language inspired by Scratch
http://snap.berkeley.edu
GNU Affero General Public License v3.0
1.5k stars 745 forks source link

Certain Browsers Can't Handle Big Lists #196

Closed cycomachead closed 10 years ago

cycomachead commented 11 years ago

Firefox 23 on Ubuntu (12.04 or 13.04, I believe) fails to work on lists which are big. Sometimes it throws a recursion error, and others no message, just a red ring.

This isn't simply a minor browser snafu. Berkeley now has ~100-130 lab machines where this is the default configuration. :/

brianharvey commented 10 years ago

Could you be more specific about precisely which list operation fails?

cycomachead commented 10 years ago

Could you be more specific about precisely which list operation fails?

It's the recursively implemented list operations, at least at the time of this issue. If the browser / OS combo is limiting the stack size, things just break... This is mainly an issue with HOFs which are implemented recursively.

Iterative versions doesn't usually run into issues. Which sucks. Because they just aren't as awesome.

brianharvey commented 10 years ago

OK. I am going to finish the hybrid tools project. Then we can have iterative real versions which, when edited, will load the beautiful recursive ones. :-)

bb010g commented 10 years ago

It's the recursively implemented list operations, at least at the time of this issue. If the browser / OS combo is limiting the stack size, things just break...

I don't know how feasible this is in JS, but wouldn't implementing tail-call optimization fix that?

cycomachead commented 10 years ago

I don't know how feasible this is in JS, but wouldn't implementing tail-call optimization fix that?

I think it could, I looked into briefly a while ago and I didn't (still don't really :P) know enough about how snap works internally to know if it would be possible, but my guess it that it should make a difference. It might depend on the browser / OS / JS engine of course -- since everything always seems to. ;)

brianharvey commented 10 years ago

No, the straightforward implementation of MAP and KEEP are not tail recursive. COMBINE, maybe.

nathan commented 10 years ago

Snap!'s interpreter doesn't use the JS stack, so it's not limited by the JS interpreter's stack size. List operations could only fail from a stack overflow if the primitive list operations recursed too much.

brianharvey commented 10 years ago

Right, but the Snap! stack could cause a JS heap overflow, right?

cycomachead commented 10 years ago

--- this was supposed to be posted 2 days ago, not sure what happened --- Yeah, I forget the specific error (~9 months ago!) but it seemed like a stack overflow, but IIRC it did have the words "recursion" and "exceeded" in the message.

(BH: These were 2nd floor labs, so if I get back up to soda sometime we could actually probably repeat the test. List in question was the 2k / 5K words list for HW2 or 3).

jmoenig commented 10 years ago

Is this reproducible?

cycomachead commented 10 years ago

It was very reproducible at the time, at least. I could probably try to make it happen again, especially with a longer words list.

cycomachead commented 10 years ago

Boom. Got it. 34k items in Chrome Canary and OS X 10.10 causes the error: screen shot 2014-07-15 at 11 16 41 pm

I'm fairly certain I can get the error on a smaller list, I just need a good way to shrink it....

jmoenig commented 10 years ago

can you share those lists with me?

cycomachead commented 10 years ago

Sure, here's the project: http://snap.berkeley.edu/snapsource/snap.html#present:Username=cycomachead&ProjectName=List%20Error%20Tests

The one that failed before was: http://inst.eecs.berkeley.edu/~cs10/fa13/hw/supp/words-2000.txt

jmoenig commented 10 years ago

Interesting, thanks @cycomachead!

So, it's not Snap's call stack that's causing the error, but some list method. Since list.js is basically implemented in the same recursive style that we're trying to encourage in Snap, this is biting us here. I haven't yet found out which method in list it is exactly that's throwing the error. One solution (which I think would totally take care of many problems) would be to get rid of the internal "hybrid" structure of lists altogether. Yeah, yeah, I know about quadratic time, but I don't believe it would matter one bit. Anyway, that's the problem we need to solve, not tail call optimization in Snap itself

jmoenig commented 10 years ago

Okay, it's all but first of that's throwing the error. I bet this is because we're converting an arrayed list to a linked list internally. I knew it!

jmoenig commented 10 years ago

... and I hate it. See, this comes out of applying ideology to programming instead of pragmatics :-)

jmoenig commented 10 years ago

Oh no! It is because we're no longer converting the array to a linked list, but instead try very cleverly to preserve whichever data structure might be present at each link. @brianharvey to the rescue! Your (aptly named) helper()function in Line 134 of list.js seems to be the culprit.

How about we don't recurse the "helper" function? Instead we only check the receiver list for being linked and leave everything as is (and if it isn't we don't create a new linked list on the fly but instead return an array - which would be faster anyway)?

brianharvey commented 10 years ago

I'll look into this later today, but from your description I think we can do the right thing iteratively by starting at the end of the array.

I have nothing against iterative methods inside lists.js, but the result of ALL BUT FIRST OF has to be a linked list to avoid quadratic behavior. Which does matter, Jens; I don't see how you can not get that!

jmoenig commented 10 years ago

Many things matter. Among them are quadratic time, but also the ability to work with non-trivially sized lists at all :-)

It would be interesting to build a block which takes the arrayed list and explicitly converts it into a linked one. Then I'm sure the HOF blocks would totally work and we wouldn't even have to change a thing. @cycomachead did you try that?

jmoenig commented 10 years ago

Sure enough: When I manually (using a custom block) turn the arrayed list into a linked one I can work with way bigger lists. In your example, @cycomachead I can work with slices up to about 17.5 k items this way:

http://snap.berkeley.edu/snapsource/snap.html#present:Username=jens&ProjectName=List%20Error%20Tests

But then I still get it to exceed the maximum stack size of JS, which indicates that some other list methods (in the JS source of Snap) should probably also be implemted iteratively instead of recursively.

Interesting...

jmoenig commented 10 years ago

... and, BTW, this is exactly why people are sooo wrong when they claim that JavaScript is "Scheme in the browser"

nathan commented 10 years ago

some other list methods

brianharvey commented 10 years ago

@nathan Thanks for the research. I'll clean this all up this afternoon.

@jmoenig It's true that JS isn't Scheme, but the odds are that the recursive calls in lists.js are embedded recursion, not tail recursion, so you can't really blame JS. I take all the blame!

cycomachead commented 10 years ago

Then I'm sure the HOF blocks would totally work and we wouldn't even have to change a thing. @cycomachead did you try that?

Your updated project does work, and I can even get it to work on the 34k items, but IMO this isn't really a great solution to problem...It's still pretty slow, and if we did this we'd still need to update all the libraries, as explaining arrays vs linked lists isn't something students should need to work about in their second or third week of classes. (The other issue is the size of the JS call stack varies wildly by OS + browser + version combo...)

brianharvey commented 10 years ago

Fear not, I'm going through lists.js right now and it's straightforward to make everything iterative. I haven't even had to reverse a list yet, knock on wood. This is keeping me busy while I sit in the car dealership waiting for them to replace my rear view camera.

brianharvey commented 10 years ago

Okay, done, I think. I emailed it to Jens.

cycomachead commented 10 years ago

Okay, done, I think. I emailed it to Jens.

GitHub. ;-)

bb010g commented 10 years ago

This article and this other one may be worth giving a look.

jmoenig commented 10 years ago

@bb010g thanks for those links, they're very interesting and clever hacks! They also reaffirm me in deciding to just switch to iterative low-level ops :-)

jmoenig commented 10 years ago

this has been fixed in making internal list ops iterative instead of recursive, so I'm going to close this now.

BUT I want to remind you all, that in fixing this we broke Snap in extremely bad and exposed ways (in front of all the CSTA teachers, for Jeff's teachers in Alabama etc.). Now, let's get Snap back to be stable again, and then let's all step back and introduce some serious thresholds in our minds about what really needs "fixing" and what doesn't.

bb010g commented 10 years ago

@jmoenig Not sure how Snap was broke by fixing this, but I still think having TCO implemented at some point is important because Snap is meant to be a transparent learning tool. Having one "magical" version of a block that looks exactly the same as yours but doesn't work isn't good for learning. Snap shouldn't depend on "magic".

cycomachead commented 10 years ago

then let's all step back and introduce some serious thresholds in our minds about what really needs "fixing" and what doesn't.

I would count this as a serious issue. It is something we ran into very early on in BJC curriculum.

nathan commented 10 years ago

BUT I want to remind you all, that in fixing this we broke Snap in extremely bad and exposed ways (in front of all the CSTA teachers, for Jeff's teachers in Alabama etc.). Now, let's get Snap back to be stable again, and then let's all step back and introduce some serious thresholds in our minds about what really needs "fixing" and what doesn't.

That's a reason to test more, not to fix fewer bugs. This was obviously an actual issue, and you can't act like it wasn't.

jmoenig commented 10 years ago

@bb010g Snap has TCO! You can run a recursively implemented loop block forever and the stack doesn't grow. The problem here was that JS doesn't have TCO, and we were using deeply recursive JS functions.

@cycomachead @nathan the way this "bug" (I don't count it as bug, we'll talk about the curriculum some other time, I actually know something about "big data") was handled - rewriting large portions of the list module "while I was sitting in the car dealer's parlor" and submitting it untested on real data (actual big projects) was a disgrace. It can't have been that important. I don't have all these actual projects (I have exactly 2 such projects from two years ago, which I keep testing on, maybe you guys want to make some more available). But just thinking that you're so smart that you don't have to even try your code just to get some fix for a marginal issue in and causing bugs and embarassments for many, many people is not what I want. I can do things, but I do them slowly. What has been going on here on Github during the last two weeks was anything but.

I repeat: Let's all step back and get Snap stable again, and not chase marginal issues, especially not at the sake of compromising a large community.

Guys, It's too easy to tell me to test everything. You're not even sharing test data, goddammit! Most of the times I'm looking at somebody's code I find out they didn't even try it. This is what happened here.

nathan commented 10 years ago

Guys, It's too easy to tell me to test everything. You're not even sharing test data, goddammit! Most of the times I'm looking at somebody's code I find out they didn't even try it. This is what happened here.

Don't trust other people's code. Don't trust your own code. As the maintainer of this repository, it is your responsibility to test everything before you merge it. Yes, some contributors could make it easier. But in the end, it's up to you.

Or just stop accepting pull requests.

jmoenig commented 10 years ago

Thanks for the tip. I'll disable issues right now.

bb010g commented 10 years ago

Or you could require each pull request to be set up with a testing harness...

Disabling issues seems a bit harsh.

cycomachead commented 10 years ago

@cycomachead @nathan the way this "bug" (I don't count it as bug, we'll talk about the curriculum some other time, I actually know something about "big data") was handled - rewriting large portions of the list module "while I was sitting in the car dealer's parlor" and submitting it untested on real data (actual big projects) was a disgrace. It can't have been that important. I don't have all these actual projects (I have exactly 2 such projects from two years ago, which I keep testing on, maybe you guys want to make some more available). But just thinking that you're so smart that you don't have to even try your code just to get some fix for a marginal issue in and causing bugs and embarassments for many, many people is not what I want. I can do things, but I do them slowly. What has been going on here on Github during the last two weeks was anything but.

I'm sorry, I didn't mean to imply that this was that urgent, more that in terms of issues students encounter with Snap! that this is one they encounter fairly early on.

Guys, It's too easy to tell me to test everything. You're not even sharing test data, goddammit! Most of the times I'm looking at somebody's code I find out they didn't even try it. This is what happened here.

I've been trying to explain what I test in my PRs but I agree we can do better here. I've also (recently, especially) been trying to write longer commit messages which describe what + why the changes were made.

However, more to the broader point --- I was very interested in looking into this a while ago and was discouraged from doing so. I'd be happy to take a look again, sometime. (Though, I should probably stop volunteering myself for these things!)

brianharvey commented 10 years ago

Gang, This problem was entirely my fault. Jens's complaint above is for me. And he's right; I arrogantly thought that the "simple" fix would just work right away.

@nathan Jens is feeling very upset and unhappy and angry about the whole enterprise of volunteers adding stuff to Snap!. It doesn't matter right now whether he should be upset or who's at fault. Giving him lectures about general principles of software management is not going to help. Please don't show off.

I feel extra bad about this bug because, you know, I'm not a student, I'm a grownup and Jens's longtime collaborator and he trusted me.

I am hereby asking everyone to refrain from meta-discussions about who should do what and what's more important than what and who has what duties.