seg / 2016-ml-contest

Machine learning contest - October 2016 TLE
Apache License 2.0
187 stars 269 forks source link

Publication vs. Contest #11

Closed LukasMosser closed 7 years ago

LukasMosser commented 7 years ago

If something is submitted to the contest, can it still be published in an article in the leading edge?

This may be a bit too meta but:

Making submissions public on Github would make it possible for anyone to take your approach and tweak it slightly and then publish it as their own?

How do other contests such as Kaggle handle this?

kwinkunks commented 7 years ago

I'm not sure I totally follow the first and second parts of the question.

Your code is your copyright, you can (should) put a notice in there. But you can (should) explicitly license it permissively too. Otherwise it's your copyright (automatically) but with no license, which is ambiguous.

As for others using your code... yes, that is the idea. I guess ultimately, the solution is — in a real sense — "everybody's". We will try to make this clear.

Kaggle is not perhaps the best analogy, because of this aspect. It's more like the MATLAB programming contest in that respect.

You can of course publish what you do in TLE later — we can be as explicit as we need to be about all this.

I think. Does that roughly align with what you were asking?

lperozzi commented 7 years ago

I again agree with @LukasMosser. The risk here is that everybody wit until 31 January to publish their script. It will be a good idea to put a notice that protects the code to be published by others.

It is an open question between academics too. People tend to be afraid about 'if i store my PhD or whatever code in github, someone could steal it and publish something with'. Actually, this the most common reply I have when I ask someone in our research group to start to use github for sharing and keep a track of their code.

LukasMosser commented 7 years ago

They can always make their code private. Github has unlimited private repositories now. @lperozzi

kwinkunks commented 7 years ago

I think you (@LukasMosser) are addressing @lperozzi's final point about PhD students, but just to be clear about this contest: private code won't be evaluated.

If people want to wait until 31 January, that's their choice (although I see there is an entry today). I guess it's a toss-up between releasing early and possibly having to improve, or releasing late and probably being beaten (assuming there are more than 2 such entries).

The open vs private academic code is a debate that will go on, I guess. Obviously I think you stand to gain more with an open mindset, but others feel differently. Nitpick: it's impossible to 'steal' open code, in two senses: (1) if it's open, you can re-use it, that's the point (faster alone, further together), and (2) you can't really claim something that's demonstrably published by someone else as your own (prior art).

kwinkunks commented 7 years ago

I'm closing this now as, although it's a really interesting subject, and although I hope we can say something really interesting and intelligent about it when we close the contest, I don't think there's an issue here for now.

Please let me know if you disagree, happy to discuss more.

thanish commented 7 years ago

I am totally agree with @LukasMosser and disagree with @kwinkunks for one scenario and agree for another scenario.

Let me say when do I agree with @kwinkunks : I agree with @kwinkunks when this is something like a research project by a group of members who wants to share their ideas, codes, want to collaborate things and improve upon with each ones ideas, which is definitely going to be a win-win situation for everyone who worked on it

Now when do I disagree with @kwinkunks : This is a machine learning contest where people are contesting their way to reach the top. Each one/team will have a unique approach. Having said that "There will be a goody bag of completely awesome and highly desirable prizes for whoever is at the top of the leaderboard" for the winners. So those whoever wants to end up at the top would not want to share their code before the end of the contest for the very same reason it can be tweaked and improved by the competitor. Consider that you are in a world cup finals match. No competitor/team would definitely want to reveal their strategy/technique before they could implement. If they are going to discuss their strategy to the competing teams then the opponents are going to learn and try to implement in their training and add their own flavour which might give them an upper edge and worse if they win. So the loser is one who revealed his technique who did not win the match. May be they won because of the competitors approach which they learnt before the match or may be not which is debatable. Now in this contest either way you(organisers) are going to be getting the codes of all the submissions and diverse approaches which is going to be a win for the organisers. But apparently it's a loss for the one who revealed his code before the end of the contest if his code was used by another competitor and tweaked.

Recently I worked on a ML competition where there was not even a common discussion portal among competitors. When I emailed the admin why so, he told me that this is the idea of the contest. We don't want contestants to know what approach the other one has taken and get carried away by it and think only in that angle. He followed saying we want diverse approaches so that they can choose the best or even combine all together to get better accuracy which sounded convincing to me.

As you have said "We've never done anything like this before, so there's a good chance these rules will become clearer as we go". I think code privacy until the end of the contest is something that can be seriously reconsidered.

kwinkunks commented 7 years ago

@thanish I am fully aware of different approaches. This is just one way of doing it, as described in this paper. I'm aware of the pros and cons of the two approaches. Maybe in the future we'll try private code.

As for this contest, it's totally up to you if you want to keep your code private until the end. But I'm not changing the rule.

mycarta commented 7 years ago

This is closed now, but I thought I'd chip in, as it is still an interesting and relevant discussion.

I actually see a competition like this as a place for teams to both:

I do not have any problem at all with someone who's got a good idea and wants to wait until later, or even at the last minute in the contest to make a submission; it would be their priority, and fair game.

But on the other hand, from my part, I would rather try something, make it as good as I can, then submit it, and move onto a different idea if I can. And, I would fully expect that if someone can better my initial approach, that they would, and submit it again, improved. I'd benefit from it, I am sure (that, to me, is not the same as copying someone's PhD, and publishing it somewhere else).

And I would do the same with theirs, if I saw a way to improve their code in a good way. To me, that is fair also, so I disagree that it is 'risky'. I think it does happen in other competitions.

It would be also easy to handle as a community something that is instead a more obvious abuse. If somebody did submit someone else's code with just a slight tweak, and no substantial change, it would be obvious, and we could handle it openly, as a collective. No? Besides, I don't see those kind of 'snakes' around here, and I don't expect it will happen.

brendonhall commented 7 years ago

@mycarta, I believe your comment captures the intended spirit of this competition. We saw it as a way to kickstart (heh) the nascent geoscience machine learning community, and establish something that would last beyond the length of the contest. From my perspective at least, this primary goal has been a resounding success.

As a community building exercise, the spirit and open nature of this 'competition' is helping to set the tone of our communication and the way we discuss ideas. @kwinkunks has done a great deal of work to advance open source science and creating environments (like the hackathons) where we improve by learning from each other. This contest is no exception.

Those who want to hold their submissions until the end so that to avoid diffusion of their intellectual property will be more than rewarded for their originality by the truly staggering value of the prize for the winner. I'd suggest they might find the competitions on Kaggle more fulfilling. Anyone wishing to submit minor modifications of somebody else's work would also be advised that other competitions might provide more anonymity than ours. We're a small (but growing!) sub-group of a small-sub group of academics & professionals.

Its an exciting time to be a geoscientist, and I appreciate the work @kwinkunks has done to promote this contest and the discussions it has started. Every question, comment, blog post & article written about this is a win for us all.

mycarta commented 7 years ago

@brendonhall nicely written. However, I do have a feeling that all teams and participants in the ML contest as it is now share this view. I'd be surprised actually if someone new came in and tried something so much at odds with the spirit of this particular contest as reusing code from an existing submission with just minor tweaks. Perhaps the initial concern expressed by @lmoss about that somebody coming in and using a submission code for their own publication on The Leading Edge is more founded, but I think the editors of the Journal, or at least of the ML special Essay, are most likely following the contest, and the attempt would be caught. And even if they didn't, I believe as a community we'd catch it, and be able to put in our vote to 'ostracize' the 'culprit'.

brendon commented 7 years ago

I think you mean @brendonhall :) Had me confused there for a bit!