Open faokryn opened 8 years ago
So when you use software in research, you are looking for projects that are active so you can work actively with the development team if you need an issue solved or feature added?
IMHO there are two kind of software in the research community. One category would be the "enablers", these are big libraries, or repositories of methods that enable users to do their particular analysis using those well established methods. For example, image analysis libraries that you use and apply to your particular images, genomic analysis in your particular experiment, numerical analysis, etc. There is usually different implementations, big competence and good code The developers are not only profiting from getting cited, but they can get grants for maintaining the code, offering services about it, etc.
The other category would be the "explorers", these are commonly developed to test a new algorithm, or a new approach to an existing problem. Solo or small teams. Vast majority of software papers are in this category. It depends on the developer how reproducible it is (sometimes the code is not released, but they might sent it by mail if you ask nicely, or you are reviewer), sometimes the documentation is scarce, etc. One good effort to solve the reproducibility problem was proposed by the Insight Journal, where source code, test, and an "implementation paper" talking about implementation details is required. I have read papers that look amazing, good amount of citations, test the code (sent by email) and realize that it wasn't great. Reviewers usually look the figures, not the code. On the other hand, there are a lot of good code out there that it is not reaching all possible customers.
A lot of this software in this latter category is implemented in scripting languages, one shot, one paper, next topic. If you don't release the code in a competitive environment, it can provide the advantage of keep publishing small modification of the same idea for a long time. Profiting all the way, with new papers. In the other hand, closed source has usually less impact for the reduced reach.
If the algorithm is a game changer, somebody will implement it in one of those big libraries (I sometimes feel this is what I do), where review, reproducibility, documentation, performance are under scrutiny.
In the current system, usually developers wait to have a paper to release the code, so future users of the code will cite that paper, because it is the only way you profit in the research currency (aka impact, h-index).
How was the software made available to you, i.e. how was it hosted? Personal webpages, downloading binaries and/or source code. A lot of sourceforge as well, and if it is open, github of course. How, if at all, did you cite the software? As I said, 99% of the times I reach the software because I first read a paper about it. So if I use the software, I cite that paper. Did you experience any obstacles that made citing the software difficult? If it has a paper associated, no. If it hasn't one, there is no way I can cite it. But usually if it hasn't have a paper, the developer is not really interested in the research currency, so it is not a big problem. The tricky point is how many people has code written, but not released in the open, because they don't have a paper associated to it? There would be great to have a tool to cite the work in progress, and that could be a niche to exploit from this initiative.
I personally take the approach to implementing it in modules, releasing all the reusable algorithms to different open source libraries (first category). The cons is that I get zero reward (in terms of academia currency) for this, and slow down my publishing performance. Then I try to profit releasing an application, basically a GUI, applying those algorithms, and the paper associated to it (work in progress). The really good thing about this approach is that I get a really good peer-review from the open source library, and the code can be re-used, maintained and improved by others.
Sorry for the long, not-revised post. I shut up now, I hope my vision helps!!
In R, you get information on how to cite the package you are using, when you call citation()
, e.g. citation(package = "partykit")
. So I can always cite the packages I use in the way the authors want me to. This is really useful and simple.
Software in R is usually hosted on CRAN or Bioconductor and packages often change. That is why people usually add the package version when they cite R packages. The old package versions are still available on CRAN/Bioconductor, which is useful for reproducibility.
@phcerdan Thanks for your insight on the scientific software community and how they use software. I like the idea of the Insight Journal, however I get the feeling that having people cite your implementation paper on that doesn't provide the same amount of research currency as releasing a paper on it would. I'm surprised that there is no reward for writing and open sourcing code that benefits the scientific community. It seems like something that would be encouraged, since software is becoming a go to way of modeling many different things.
@HeidiSeibold That's an interesting point. We're trying to focus on Github projects at the moment, but maybe we can learn from how R does it and add the ability to scan for a citation file that could tell the software citation tools how to prepare the citation.
@Ourobor it is an easy trick really. There is just a dedicated CITATION file in each R package. You can get more info on this here https://cran.r-project.org/doc/manuals/R-exts.html#CITATION-files
Please describe how software created by others and used in your research is found and cited. How was the software made available to you, i.e. how was it hosted? How, if at all, did you cite the software? Did you experience any obstacles that made citing the software difficult?
More Discussion Questions