src-d / awesome-machine-learning-on-source-code

Cool links & research papers related to Machine Learning applied to source code (MLonCode)
Creative Commons Attribution Share Alike 4.0 International
6.18k stars 842 forks source link

Add more "introduction" labels #55

Open vmarkovtsev opened 6 years ago

warenlg commented 6 years ago

Furthermore, if I may, I'd like to add the conferences the papers are submitted to. I like this info because it always gives me a quick insight about the paper's quality/style

vmarkovtsev commented 6 years ago

Good idea!

marnovo commented 6 years ago

What are the criteria for labeling something beginner?

osanwe commented 6 years ago

Maybe which required basic knowledge of ML Math and medium or less understanding ML Tech?

vmarkovtsev commented 6 years ago

@marnovo @osanwe It should be friendly to people who have just started exploring MLonCode or do not want to spend much time in order to understand the paper.

That is hard to formalize; whenever someone recommends us to add this label or vice versa, we attentively consider doing so.

bdqnghi commented 6 years ago

I really like the idea to add the conference where a paper is published, since not all of the papers are quality, it's likely that the papers that are published in the top tier conferences have better quality than the lower tier conferences

vmarkovtsev commented 6 years ago

There is another proposal then: remove the papers which are considered not awesome enough since this list is "awesome". There is no goal to catch them all.

marnovo commented 6 years ago

@marnovo @osanwe It should be friendly to people who have just started exploring MLonCode or do not want to spend much time in order to understand the paper.

That is hard to formalize; whenever someone recommends us to add this label or vice versa, we attentively consider doing so.

@vmarkovtsev agreed it's hard to formalize, but would be helpful to have some yardstick heuristics to standardize the process (and the outcome).

Another idea that could maybe be easier to implement and sounds less judgemental: in a similar fashion that GitHub introduced the "good first issue" label for helping beginners to find their when contributing to a project, we could instead of "beginner" mark papers as "good first read" or "good intro paper".

I really like the idea to add the conference where a paper is published, since not all of the papers are quality, it's likely that the papers that are published in the top tier conferences have better quality than the lower tier conferences

@bdqnghi agreed with Vadim. This sounds like a different proposal, I'd invite you to open it as a new separate issue so we can discuss it over there. Thanks!

vmarkovtsev commented 6 years ago

@marnovo "beginner" is exactly a shorter "good first read". The latter is too long and occupies much space thus we decided with @campoy to name it "beginner".

osanwe commented 6 years ago

@vmarkovtsev, I think, for example, the paper "A Survey of Machine Learning for Big Code and Naturalness" by Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton is a very nice introduction to MLonCode but is it a bad idea to mark this paper with beginner label.

vmarkovtsev commented 6 years ago

Agreed

campoy commented 6 years ago

Why is beginner a bad label for the paper? Beginner doesn't imply it's a bad paper, but that it's a great place for you to begin reading on the topic.

marnovo commented 6 years ago

@campoy I believe the point is that the paper is far from "easy", even though it might be very good quality or a good introduction to the topic.

To give a bit more color on what I mentioned previously: as much as I don't think beginner is terrible, it seems way more judgemental than good-first or good-intro, for instance.

E.g.: What does it mean, really? Does beginner mean you're a beginner in ML, in source code analysis, or in MLonCode… or all of them, any of them? If one reads a paper marked for "beginners" and is barely able to understand it (as seems the case of the aforementioned paper for most), how should them feel, bad?

In the end you have different dimensions to judge a paper on here. E.g.:

All this considering the myriad of profiles of people that come to the repo… so a good intro to ML on Code paper doesn't mean it is easy, as the way around. This is why I'd rather have the concept better scoped and defined, so it is more consistent and readers know what to expect; maybe even have more than one label if we eventually need.

vmarkovtsev commented 6 years ago

We should rather add the second label "intro" which is easy to assign and document what is the "beginner" because even our PM thinks that it tries to judge while it completely does not :)

"beginner" does not take into account the quality (bad quality papers are not a part of an awesome list), topic, suitability (all the papers must be suitable to MLonCode, otherwise we need to delete them). Only the last point holds. And it is by def very subjective so until we've got active voting users we will continue assigning "beginner" based on our complex internal feelings and emotional biases.

campoy commented 6 years ago

My complex internal feelings and emotional biases don't care that much about what label we use, tbh. Beginner or intro work, I will not push one way or the other.

marnovo commented 6 years ago

Deal.

eiso commented 6 years ago

Decision made: change the "beginner" label to "introduction"