Provide more helpful feedback for an incorrect submission

charlietanksley commented 12 years ago

I'm working on puzzle 14. My submission fails. So my code must go wrong somewhere. But I have no clue how to tell where. My code handles the simple case perfectly. And the output for the complex case doesn't look obviously wrong. Of course, it is certainly wrong; that isn't the point. If I don't know the right answers I can't figure out where my code goes wrong.

Something like one of the following would be really nice:

upon submitting a failing answer, be shown the first line (or group of lines via a diff tool) where your submission is wrong (and what the right one should look like).
after some number of failed submissions (maybe 5?) you are given an option to download the solution.
make the simple cases more representative of the complex ones--if you can get the simple ones right, you should be able to get the complex ones right; the differences should not be differences in kind.

I really like trying to solve these puzzles--I think they are great! But it is frustrating to simply hit a wall and have zero clue why my solutions are wrong.

tehviking commented 12 years ago

This is sort of tricky, as your answer isn't "evaluated" as much as it's your answer file is hashed and that hash is checked against that of the original solution.

Also, since the most recent 3 puzzles are used to judge entrance into Mendicant University, we can't release solutions until admissions are closed. For prior puzzles though, that's a pretty good idea, and one I hope Greg & Jordan will greenlight.

Possibly the most irritating thing to puzzle-solvers is that trailing whitespace in the answer is the most common cause. It may be in your case as well, and I think Puzzlenode should allow for some flexibility there (i.e. keep a hash of multiple correct answers with/without whitespace and mark as correct if one of those hashes is met).

As far as more representative cases go... finding "corner case" issues often doesn't emerge until we get feedback like this. We'd be glad to have your input on improving the sample input once you're comfortable with your puzzle solution. In this case, I'd suggest you hit us up at puzzlenode@gmail.com to work through your issue and see how we can improve the sample input/output to represent the issue you're hitting.

Sorry you're having issues. This is great feedback to have out in the open, and to help resolve your specific issue, the best way to get squared away is to contact us at puzzlenode@gmail.com.

semmons99 commented 12 years ago

Here's two quick ways to get help if your stumbling on a solution. First pop over to #rmu on freenode and ask there. You'll usually get a response in a fairly reasonable amount of time. If that doesn't work try and contact the Puzzle Master Brandon at puzzlenode@gmail.com. Quite often it ends up being a missing/extra LF at the end of the file, but both of these avenues can help you quickly. We have discussed what to do if someone continues to submit failing answers, but haven't come across a satisfactory system yet that could not be gamed.

charlietanksley commented 12 years ago

Thanks for the replies.

@tehviking, I took a look at the code before submitting the issue and saw that you were not just diffing the two for comparison. That certainly makes my first suggestion harder! As to the Mendicant exams, it doesn't seem to me like this should be such a big deal. You aren't getting in on the basis of getting the right answer, but on the basis of writing good Ruby that produces the right answer. Perhaps letting people know the answers would increase the number of noisy submissions (people who submit even though their code doesn't produce the right answer). If so, that would be bad, since you all are doing such amazing work with that program! But I (perhaps naively) think that wouldn't happen.

I'll keep playing with my solution to see if I can figure out the problem. If not, I'll have to take @semmons99's suggestions and ask for more help!

Thanks again for your replies and for this site. Trying to figure out how to solve these problems is a lot of fun and I seem to learn a lot every time I try (even if I can't get the answers right for some reason!).

jordanbyron commented 12 years ago

@charlietanksley I've been thinking about this since you posted it and I still don't have a great solution for this problem. The diff suggestion could work, but then our leader boards would be totally pointless. Anyone could just upload a blank file, get the solution (via the diff) and then climb up the rankings. Same problem with offering up the solution after X failed attempts.

Even making the simple cases "more representative" of the complex solution is hard. It's really, really hard to create these problems, and even harder to make sure that all edge cases are covered in both solutions. We try really hard to make sure that is the case, but even after all that work it's still possible to write up a solution that works with the simple case and fails on the complex one.

So where does that leave us? I think we need to make "asking for help" even easier then it is now. Maybe even automating the process a bit allowing users to submit a "Help request" which other members of PuzzleNode / Mendicant University can respond to. Having people email our support and jumping into #mendicant works, but it's really organic and doesn't leverage the talented people who use and love PuzzleNode.

I'll talk to @sandal about this next week and we'll try to come up with a solution that works. Thanks for using the site, and keep plugging away at the puzzles!

gjp commented 12 years ago

re: diff tool - I had a half-baked idea about notifying which lines are incorrect without explaining the correction. But the more I think about this in the context of the existing puzzles, the less I think it would work. A simple error could make every line slightly wrong, some solutions are best understood as pictures rather than a sequence of output lines (in the case of 6, a literal bitmap), etc.

So +1 on the idea of a help request system.

JEG2 commented 12 years ago

I like the idea of saying which result is wrong, but not giving the correct result. The point is to struggle a bit and learn something, in my opinion, so there should be no free answers.

But I don't really see why saying which example fails is a barrier. In the chess problem, for example, you would walk through each LEGAL/ILLEGAL and the first time you find a non-matching result you could just say: you got line 5 wrong. That's plenty. It gets me to, "Hmm, I screwed up the Bishop moving code. I need to look into that."

Most of the problems are like this, but some are trickier. The answer to Hitting Rock Bottom, for example, is all on one line. For it you need to check number by number and it will be harder to show which one failed. This tells me it's probably worth favoring the one line per question output style.

An approach line this could also easily eliminate the trailing newline issue, which I would want to do. I would rather the focus was on solving the programming problem than getting the whitespace wrong (our issue) or specifying the problems so perfectly that we know where every chunk of whitespace belongs (your issue).

gjp commented 12 years ago

@JEG2 good points. Ok, let me assume for the moment that we want to diff. :)

Is line-based output a reasonable constraint for new puzzles? What would we lose by doing that?
How do we handle existing puzzles which aren't practical to diff: 6 is binary, 9 is a 2MB text file, 12 is all-or-nothing

Perhaps some combination of the following would help smooth the submission process. I'm just tossing all of these out here for further discussion:

Favor solutions which involve fairly short lines of newline-separated text on future puzzles.
Thoroughly process whitespace on our end (trim, /r/n -> /n, /n+ -> /n, trailing /n) prior to either diff or hash. We'd have to flag any binary solutions to skip this.
Provide a partial diff when we can, and an explanation when we can't ("This answer was checked with a simple file hash, so we can't tell you exactly what was wrong. If you're really stuck, contact us at..."). Another puzzle-level flag.
Provide a help request system as a fallback. Mailing list? Moderated forum? I don't know what form this would take.

practicingruby commented 12 years ago

@gjp / @JEG2: It might be reasonable to support line-based checking, but we'd need a facility for marking a puzzle as having a binary solution. Some of our puzzles ask for encoded images, for example.

But if we just had the ability to toggle between binary / line based matching and defaulted to binary, we'd be able to gracefully support all old puzzles until they were recast into different input/output formats, and we could begin using line based solutions for new puzzles except where we'd need to do binary.

Responding with something like "Line 5 was wrong" seems reasonable, and I think perhaps we should leave it at the first wrong line to not reveal too much and also to not complicate returning feedback about largish output files.

With this in mind it sounds like this is something we can implement. Anyone want to take a crack at it?

gjp commented 12 years ago

I'm game. I'll have to give some thought to testing. Would you be willing to send me the puzzle data and solutions from the live site? I could rig up something using my own solutions, but I think the live data would be more thorough.

practicingruby commented 12 years ago

@gjp: Sure, you have access now https://github.com/mendicant-university/puzzlenode-problems

gjp commented 12 years ago

@jordanbyron: Would you prefer if I save pull requests for PuzzleNode week?

jordanbyron commented 12 years ago

@gjp you can send them now, but they probably won't get reviewed / merged until next week.

gjp commented 12 years ago

I'll see if I can nail this one during the next couple of days. I still feel that finding a set of test cases which is representative of actual wrong answers on a range of puzzles is the key here.

gjp commented 12 years ago

@jordanbyron Wouldn't we need to store the actual solution files somewhere on the server if they're to be compared with submission files? Do you have suggestions on how to do this securely? Alternately I suppose it would be possible to hash on a line-by-line basis, but I haven't thought through whether that would work.

My intent is to display one (or more, for context) numbered lines of the user's own submission back to them in case of an error, e.g. "You got it wrong here ->". I'd like to do this rather than returning only a line number, partly to make the error clear and partly because line numbers may change if we decide to fiddle with whitespace. Doing this will require additional state. I figured I'd add a text field to Submissions. How does this sound?

Each user already has a submission history, but it's just a series of booleans; depending on the structure of our solution files and how much state we capture per submission, we have the option here of showing users their progress across multiple attempts. We might also consider keeping enough information to help an admin see where a user is stuck.

practicingruby commented 12 years ago

@gjp: We could probably permanently store our own solution files on the server and do the comparision to the user's uploaded file without storing it permanently (as we currently do for hashing). Look into carrierwave for file uploads, unless @jordanbyron has a better suggestion.

gjp commented 12 years ago

@sandal There's already a mechanism in place for making tempfiles permanent, used by Attachments. We don't want the solution files copied to /public like attachments, because they'd be world-readable by default. So the question is how to control access to the uploaded files without pushing a configuration dependency out to the web server. Storing solutions in the database is one way, but I'm hoping someone will point out if I'm missing anything obvious.

practicingruby commented 12 years ago

@gjp: We should probably move to carrierwave for both the puzzle file attachments and for storing the solutions. I think we can do the access control at the application level, none of this needs to be in the public folder. But please let @jordanbyron advise on the details, I haven't looked at Puzzlenode's source in basically forever :-/

jordanbyron commented 12 years ago

Carrierwave sounds like the right fit for this problem. I also agree with @sandal that we should replace our existing Attachment model to use carrierwave under the hood. Saving somewhere other than the public folder is the right call. These files are super tiny and I believe we are already using send_file so that should be a very simple change.

As for the folder structure / name, I'm thinking uploads in the root of the project, then creating folders with each puzzles' id, and uploading solution files and attachments to that directory. Of course any suggestions are always welcome :smile:

Each user already has a submission history, but it's just a series of booleans; depending on the structure of our solution files and how much state we capture per submission, we have the option here of showing users their progress across multiple attempts. We might also consider keeping enough information to help an admin see where a user is stuck.

I think keeping the boolean correct on Submission makes sense, but we might want to add an additional column called hint which we can throw Line 5 was wrong or whatever message we display to the user. How does that sound?

mendicant-original / puzzlenode

Provide more helpful feedback for an incorrect submission #59