Add procedures and tools to measure programmer productivity implications

I just stumbled upon the type theory podcast and am listening to ep. 4 about the Zombie language, and I am quite intrigued about the question wether the resaerchers in question could actually/ magically know if a programming language is easy to use or not, without asking any programmers in systematic way, so what methods are currently in place for evaluating the usefulness of the languages, other than anecdotal observation and individual experiments by the resaerchers themselves?

I am not sure, if this is a GitHub issue, I can remove this post if deemed inappropriate.

It would be nice if programming language design is also driven/guided/informed by rigoros emperical studies focussing on programmer productivity, easy of use, costs of maintenance. etc...

I think this should be a big concern in language design, without emperical evidence how could we be sure, we're actually developing anything of value, be it knowledge or better Software or happier programmers.

I think that those deep into the theory might not be the best judges of productivity or accessability, but let's not discuss if certain resaerchers are probably good at estimating this - it's plainly unscientific to begin with. I assume theres at least some feedback through the process of teaching, but this intransparent. The results are irreproducable.

Again sorry if this might come across as negative in any way, but as awesome sone if that stuff is I saw in this Repo, it would still be even better if the question on how to scientificly evaluate if the goal of creating a nice to use language is met, was a primary, day-1concern if this whole Endeavour.

"empirical proof" is kind of an oxymoron, isn't it? At least, based on my (admittedly rudimentary) understanding of philosophy of science, I'm not sure how to interpret that notion...

I can't speak for the motivations of the Trellys project, but as a purveyor of experimental/research programming languages and proof assistants, I can say without a doubt that it is important to develop languages whose goal is to advance the capabilities of a system (rather than advance the capabilities of a user of a system).

Both are important, but we are PL researchers and type theorists; there are entire fields devoted to the sort of things you are wondering about, including Human-Computer Interaction (HCI) and Software Engineering (SE)—these people take ergonomics of programming languages very seriously and scientifically. This is, however, not what Programming Languages is about as a field—but we really do value the work done by others in this area, and hope we can combine our advances before hitting the mainstream.

Ok, thanks for the response.

At least, based on my (admittedly rudimentary) understanding of philosophy of science, I'm not sure how > to interpret that notion...

Is that even remotely relevant? I have a strong feeling to express that down here on earth we have real work for the computers to do. The programs we (industry programmers) produce are hard to write, hard to read, hard to maintain and mostly wrong. I realize and appreciate, that you are also doing real work.

But you cannot ignore that fact, that we have a whole menu full of folklore from "prophets" that invent methodology based on anecdotes, heresay and some vagure statistics on defects etc... They evangelize methods like- extreme programming, pair programming, test driven development, clean code and what not.

The thing wrong with them, is that they make classical human errors, which the scientific method tries to avoid, e.g. confirmation bias, basing conclusions on anecdotes, etc.

It is important, not to fall for the exact same fallicies.

Both are important, but we are PL researchers and type theorists;

They are also strongly related. The "language" aspect in "programming language" design, as well as the "programming" aspect are both governed by human factors as much as they are by math and logic.

In language design there should be more awareness of the fact that the hypotheses produced are relevant for the purpose of (human) programmer support, and there is only one scientific way to gain knowledge of the truth of these kinds of hypotheses: ask many programmers in a systematic way, compare the results and tell everyone how to repeat the trail. This awareness was not displayed in the scientific publications I have perceived until now.

there are entire fields devoted to the sort of things you are wondering about, including Human-Computer > Interaction (HCI) and Software Engineering (SE)

If you push away ergonomics and HCI and SE from the field of programming language research, then what actually do you research, that is not just pure math and logic?

to develop languages whose goal is to advance the capabilities of a system (rather than advance the capabilities of a user of a system).

Okay that makes sense to see it that way given the divided cultures in research, but I think most advances in culture and science come from cross pollination of different fields. It would be a start if one would point that out more to the researchers, so they would hint at needed emperical psychological research opportunities rather than pulling propositions out of thin air.

Some examples, from actual papers . These might seem ok-ish at first but put yourself in the mindset of e.g. medical scientist researching new drugs, or a psychologist, or a physicist, and you will see many unproven, assumed premises, that would very well be testable, or are at least falsifiable:

This specification is itself useful in its own right.

In either case, programmers benefit from implicit unboxing

Static type systems strive to be richly expressive while still being simple enough for programmers to use.

simple enough to make sense to programmer

This form of kind indexing allows for code reuse

There are grey areas consider this:

The mobile rule lets the programmer write simpler types, because mobile types never need to be tagged with logical classifiers.

The obvious open question is, what does the programmer think? Regarding what you said about improving the systems not the users capabilities, how do you know what the system should optimize for?

I realize this is grey, there are many valid and important answers just to be found in improving the system. I really value that work, I am fed up with test driven develoment more than anyone else, I want more clever systems and tooling - but they must be optimised for the human brain, too. Also it is research reality that over and over language paradigms are implemented in different styles with different trade offs but they are tested ONLY by the professors or the students themselves, just because they judge that it is not "good enough to use".

It's just wrong on so many levels that a committee of CS researchs designs a programming language, like done with Haskell. Haskell 2010 + many extensions turns out to be my favorite language, it's far far away from the original design, so there must be some good there. The "good" in the process was probably massive programmer feedback, in an naive, ad-hoc manner, this is emperical evidence for (implied) propositions about ergonomics.

All I say - the only proposition I dare to make is this: make verification of user productivity/satisfaction not only an afterthought.

Image what it would be like, if next to performance benchmarks and build server results, results of online surveys of new language features were presented, or if the github issues contained issues of the tagged "needs human trail", or a proof assistant could not only support IDEs in the obious ways they already do, but in certain ways that really caters to how programmers work.

Care to respond more detailed than just thumbs down? I think I raise a very valid issue, maybe not in the nicest way, I do appreciate any feedback. I might have it all wrong and I'd really like to know.

This is, however, not what Programming Languages is about as a field—but we really do value the work done by others in this area, and hope we can combine our advances before hitting the mainstream.

I honestly think that there is hardly an example for that happening, and I really apprecieate that the community recognizes the need to do that.

Care to respond more detailed than just thumbs down?

Not right now; from my perspective right now (which may not be correct), it looks like you're trolling, or looking to pick a fight. (If you want to change how PL & type theory research works, join it! Just be prepared to learn a lot of difficult stuff.) The other reason I don't want to respond further is, I don't represent the Trellys team and I don't want to put words in their mouths.

BTW, I like your adjustments to the issue title; this seems much more constructive, thanks.

Thank you for replying, I really feel bad for the bad title, and stirring up a fight against wind mills, in probably the wrong place, and also probably based on my wrong perspecitve from being too little involved in research.

I admit being in some kind of mood when I listend to the mentioned show, which led me here, and I did want to vent - but I do not want to waste peoples time by provoking them.

I fled the usual industrial programming religions like clean-code and to find a place governed by rigorous thought and emperical science. Now certain statements are just red-flags for me

Still I might have overreacted - sorry.

@sheyll That's OK. I'll make a few last remarks if it helps:

To expand on my early comment about "empirical proof", I meant to say that proof deals with synthetic a priori judgment, whereas empirical knowledge is entirely a posteriori. We cannot prove empirically that a thing is useful—since any case in which it appears to be useful may just be a coincidence, and we can't rule that out. So, my thinking on this would be to say that we do not want to find "empirical proof that this is useful", so much as a body of evidence (based on practice & user testing) that increasingly suggests a thing is useful.

PL research is much closer to mathematics & philosophy than to modern science; the version of "science" that we are in the business of doing is not quite the same as what you learn in grade school as the "scientific method". We still do things like "make a hypothesis and test it out", but our activity is not fully empirical in the usual sense, and it is not based on falsifiable hypotheses—instead, it is based on verifiable hypotheses (which is a much stronger/harder condition).

Natural science, for instance, cannot be based on verifiable hypotheses because in general: all we can do is hope to find either evidence that supports our model (but does not "prove" it canonical), or evidence that contradicts/falsifies our model. On the contrary, in mathematics we can do both, and we are specifically in the business of hypothesizing some model for phenomenon, and then verifying this hypothesis. Because we are most interested in this side of things, there are whole swaths of hypotheses which are outside the reach of what we are doing; this includes questions about whether a tool is useful, etc. Among ourselves, we try it out and informally come to a personal opinion; but we expect that empirical scientists in other fields will test our work more rigorously according to falsifiable hypotheses, etc.

As a past member of the Trellys project I can say that what Jon is describing is how we went about it.

The projects aim was what happens when we combine a generally recursive dependently typed PL (programmatic side) with a terminating dependently typed PL (the logical side) in the same language, in such a way, that we could verify the correctness of the programs written in either side.

We thought it would be useful, but we didn't do any type of user studies to decide that. That being said, the languages we came up with (which there are more than one) were very much core languages, and so the usability was not that great. So such a study would really need to be with respect to a language with a surface language that was built for programming, and not elaborating to.

Anyway, I am not so sure that the issue tracker is the proper forum for this conversation. Best.

Hi Sven,

Thanks for bringing up important points: I agree that languages are only one component of a suite of tools that developers use (such as IDEs, version control, static analysis tools, testing frameworks, etc), and that language designers should not limit themselves to thinking about programs as straight-line text.

I also agree that language usability research is important; but it’s not the kind of work that I do. Others have more to say on that topic (See for example, http://2015.splashcon.org/track/plateau2015 http://2015.splashcon.org/track/plateau2015). But, I’m not confident that I have the skills that it would take to get valuable information on the languages that I work with; nor do I have a collaborator with those skills. Sorry for sounding defensive, but your email reads as “why do you not have a completely different research program?” There are many problems to solve and many ways to create knowledge.

Like you, I am irritated by unsubstantiated assertions made in papers about programmer usability, and I apologize if some of my text in the paper or claims in the podcast came across that way. Generally, I meant such comparisons in fairly narrow sense---as comparisons between different design options in Zombie, for example, instead of between Zombie and other programming languages. However, I do believe that languages with simpler formal semantics (but are otherwise equivalent) are more usable, despite the lack of quantitative study.

As you say, this issue tracker is not the appropriate place for this discussion. The Zombie/Trellys project has ended; and the doctoral students who did the majority of the work have graduated and moved on.

On Aug 26, 2016, at 3:08 PM, Harley D. Eades III notifications@github.com wrote:

As a past member of the Trellys project I can say that what Jon is describing is how we went about it.

The projects aim was what happens when we combine a generally recursive dependently typed PL (programmatic side) with a terminating dependently typed PL (the logical side) in the same language, in such a way, that we could verify the correctness of the programs written in either side.

We thought it would be useful, but we didn't do any type of user studies to decide that. That being said, the languages we came up with (which there are more than one) were very much core languages, and so the usability was not that great. So such a study would really need to be with respect to a language with a surface language that was built for programming, and not elaborating to.

Anyway, I am not so sure that the issue tracker is the proper forum for this conversation. Best.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sweirich/trellys/issues/26#issuecomment-242824701, or mute the thread https://github.com/notifications/unsubscribe-auth/ACAZduiK3ziTy36LIcLGkwNLfHPa5O1Rks5qjzmcgaJpZM4JtsaQ.

I very much want to point out, that now I realize that my issue has little to do in with the trellys project. I want to state that I am neither referring nor aware of the history of the project.

Hi Stephanie

I really don't know if I was actually supposed to answer or not, I will promise this is my last post in this thread, if you not explicitly ask me to respond.

(disclaimer) I thought you were really great in that podcast episode, I really like the way you present the topic and the clarity of your statements

Also, I really enjoy the topic of dependently typed programming as a Hobby and reading a nice paper every now and then in that little spare time I have.

I am excited about that journey and what great languages come out of it.

Also I have the greatest respect for your work, I really admire what you and Richard did to GHC, thank you so much for that. (end of disclaimer)

Like you, I am irritated by unsubstantiated assertions made in papers about programmer usability,

That was my initial motivation for this issue. Thanks for bringing that to out so clear.

Also, thanks for the links to the splash conference, I was totally unaware of it.

So I will respond to that and then leave it at that:

Sorry for sounding defensive, but your email reads as “why do you not have a completely different ? research program?”

I did not intend to offend, all that happend was that I was really surprised, that and why "programming language"-research seems to position itself so very distinct from the emperical part of language design, e.g. ergonomics and HCI (almost being offended when questioned about it ;) )

The splashConf 2015 page has only one talk about functional programming.
On the list of accepted papers for ICFP 2016 I saw none about empirical usability research.
You yourself wrote:

But, I’m not confident that I have the skills that it would take to get valuable information on the languages that I work with; nor do I have a collaborator with those skills.

This indicates that these fields are very divided and there seems to be no conscious path from research languages to mainstream languages.

But also, I don't see that in (m)any other languages either. I am actually not aware of any programming language development, explicitly designed to incorporate empirical usability studies.

If anything at all, this could be an opportunity for the functional programming community to shine, if that is something to strive for.

Please everyone take no offense, I might just be ignorant of the scientific process in PL research.

With best regards, Sven

sweirich / trellys

Add procedures and tools to measure programmer productivity implications #26