Closed zachary-foster closed 7 years ago
@zachary-foster and @adamhsparks
I suggest that we focus only on a selected set of journals (our expert judgment or asking others to review would suffice), which are the primary choice for most plant pathologists - recall that we will be submitting this potential manuscript to the leading plant pathology journal. I made a quick list below based on Adam's previous selection.
Given that most are applied, I suggest a different categorization as well. If you agree, let's check if these categories are correct and each of us could pick a set of around seven journals to scrutinize.
Based on my experience, focusing on raw data accessibility and computational methods, most will fall in a "not reproducible" category.. and if that is true the work will be quite quick so that we could increase the number of articles per journal up to, say, 20? hence, 400 articles! let's randomly select 100 articles (Adam's code) per journal and then we decided later on where to stop recording in order to be consistent.
Journal name | Scope | Research aspect |
---|---|---|
Australasian Plant Pathology | Broad | Applied |
Canadian Journal of Plant Pathology | Broad | Applied |
Crop Protection | Broad | Applied |
European Journal of Plant Pathology | Broad | Applied |
Forest Pathology | Specialized | Applied |
Journal of General Plant Pathology | Broad | Applied |
Journal of Phytopathology | Broad | Applied |
Journal of Plant Pathology | Broad | Applied |
Journal of Plant Virology | Specialized | Applied |
Molecular Plant Pathology | Broad | Fundamental |
Nematology | Specialized | Fundamental/Applied |
Physiological and Molecular Plant Pathology | Broad | Molecular |
Phytoparasitica | Broad | Applied |
Phytopathologia Mediterranea | Broad | Applied |
Phytopathology | Broad | Fundamental/Applied |
Plant Disease | Broad | Applied |
Plant Health Progress | Broad | Applied |
Plant Pathology | Broad | Fundamental/Applied |
PLoSONE | Broad | Fundamental/Applied |
Revista Mexicana de Fitopatología | Broad | Applied |
Tropical Plant Pathology | Broad | Applied |
@emdelponte
I suggest that we focus only on a selected set of journals
Im fine with that, as long as the articles are selected randomly.
Given that most are applied, I suggest a different categorization as well.
Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).
each of us could pick a set of around seven journals to scrutinize
Do you mean scrutinize the journals in order to determine journal attributes or that we each get seven journals to read papers from? If the former, I agree. If the latter, I think we should randomly pick who reads what article independent of the journal so that the reader is not a confounding factor with journal.
let's randomly select 100 articles (Adam's code) per journal and then we decided later on where to stop recording in order to be consistent.
I like that plan. We should make "goals" that we all have to meet before anyone reads more papers. That way no one does more work than needed. For example, we can start with 20 articles each and once everyone has read 20 articles we increase the goal to 30 and so on until we get tired of it. An issue for each goal would work well to keep track of progress.
@zachary-foster @adamhsparks
Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).
I liked this simplification and categorization at the article rather than the journal level - Both types will definitely be found in a same journal. Both levels can be used. We will have a better sense of the categories at the article level during the work. This could include pathogen description, population biology, epidemiology, management, etc. so that we could identify which kind of study authors are more prone to make it more reproducible? anyway, the simpler the better but let's see how it goes!
Do you mean scrutinize the journals in order to determine journal attributes or that we each get seven journals to read papers from? If the former, I agree. If the latter, I think we should randomly pick who reads what article independent of the journal so that the reader is not a confounding factor with journal.
Yes, I meant the scrutiny only after randomly selecting them. For journals like PLoS and Crop Protection we also need to define if we will skip those articles that are not plant-pathology related, which will be the most common case. The same for specialized journals such as Nematology, etc. when no plant pathogen/disease is involved.
Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).
This was my original intent
@zachary-foster @emdelponte @grunwald I'm finally getting back around to this and will dedicate some time this week to this work. From what I read here, you guys have captured my original ideas and clarified them much better than I had managed to.
My take on this is that we need to decide on our journals we're sampling from, the list @emdelponte gave above is a good start. If we drop one from the list, we have twenty, that leaves each of us with five journals that we can select our articles at random from (four replicates of five journals each, if you will).
Then each of us can use my code to randomly select the articles from our respective set of five journals.
I'm happy to leave it to each of our discretion how to categorise the articles. I suspect we'll have some that fall into two categories or maybe even three.
@zachary-foster's paper attributes good for me. I do agree with @emdelponte, that if we define reproducible as having raw data and computer code available, etc. that most everything will fail. But, we can make that the gold standard and see if we can find anything that reaches that level. This means that we should have some basic categories for reproducibility as well.
Gold
Silver
Bronze
Not passing
@zachary-foster @adamhsparks @grunwald
I liked these categories. How do you visualize the data frame? an article per row and how about the columns? and values to assign? binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible). Could the final category be decided based on median score?
I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology.
For Crop Protection, one should skip articles that do not deal with plant disease.
@emdelponte, I think that's a reasonable idea, we can assign a value to the categories as you've suggested. I've updated my previous comment with scores based on 0-3 as you've suggested. A gold score would be 6 in this case and silver would be 4, but there might be 5s depending on data vs computational methods, etc.
I second removing PLoS from the list and agree with skipping non-plant pathology articles in Crop Protection.
Here's how I'd envision the data frame structure.
reproducibility <- tibble::tibble(
Article = "The Area Under the Disease Progress Stairs: Calculation, Advantage, and Application",
DOI = "PHYTO-07-11-0216",
Journal = "Phytopathology",
Authors = "Ivan Simko and Hans-Peter Piepho",
Year = 2012,
Vol = 102,
Iss = 4,
pp = "381-389",
IF = 3.011,
Journal_class = "Fundamental",
Page_charges = 130,
Country = "USA",
Open_or_Restricted = "Optional",
Reproducibility_instructions = FALSE,
Iss_per_Year = 12,
Supl_mats = TRUE,
Comp_methods_availability = 2,
Software_availability = 1,
Software_citation = 3,
Analysis_automation = 0,
Data_availability = 0,
Data_annotation = 0,
Data_tidiness = 0
)
reproducibility <- dplyr::mutate(reproducibility,
Reproducibility_score = sum(Comp_methods_availability,
Software_availability,
Software_citation,
Analysis_automation,
Data_availability,
Data_annotation,
Data_tidiness))
I have adapted the paper attributes with the 4-score system and put it in an Rmd so we can refine it better. Check it out here:
@adamhsparks
If we drop one from the list, we have twenty, that leaves each of us with five journals that we can select our articles at random from (four replicates of five journals each, if you will).
I think we should select the articles from all the journals first and then split them up randomly, independent of journal, otherwise the person reading will be a confounding factor with journal.
if we define reproducible as having raw data and computer code available, etc. that most everything will fail. But, we can make that the gold standard and see if we can find anything that reaches that level.
Yes, I would expect very few projects to be entirely reproducible. However, with the 0-3 scoring system, 2 (silver) would still be relatively good.
@emdelponte
binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible).
I like the idea of a 0-3 scoring with 3 being exceptional and 1-2 being typical.
I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology. For Crop Protection, one should skip articles that do not deal with plant disease.
Agreed.
I think we should select the articles from all the journals first and then split them up randomly, independent of journal, otherwise the person reading will be a confounding factor with journal.
Agreed
I've edited the Rmd file to include the example tibble with @zachary-foster's updated suggestions for reproducibility categories
Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.
While SAS is finally free, one cannot reproduce publication ready graphs in SAS. So SAS should get a lower score. Also, code is not open source and cannot be improved by SAS user community.
I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology. For Crop Protection, one should skip articles that do not deal with plant disease.
I would include PLOS indirectly by selecting 'plant pathology' articles randomly?
binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible). I like the idea of a 0-3 scoring with 3 being exceptional and 1-2 being typical.
I like the 0 (no code) to 3 (fully Open Source code)
Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.
I think a score of 2 sounds about right. Its easily available, but proprietary. Not being open source makes it less reproducible even if it is free (e.g. you don’t know how a change in version would affect results or the details of how algorithms are implemented).
Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.
Also keep in mind that SAS is not OA code. Also, cannot produce graphs for publication. Free not equal to Open.
Yes, free != open, that's why I raised the issue.
@grunwald is right, I'd forgotten that you can't make graphs for publication. That is part of being reproducible. Maybe that does warrant a lower ranking? I don't want to be seen as saying "SAS is bad" to use, but...
It seems that we are happy with this scale, let's move forward using this and see how we go, also, I don't have any other journal attribute suggestions to make either, so we can record those as well.
As I'm going through this list that @emdelponte proposed, MPMI isn't listed here.
I think I need to rerun our list and add MPMI in, that's a pretty big journal to omit.
@emdelponte, I am not finding a "Journal of Plant Virology" as you have suggested.
I've elected to go with http://www.virologyj.com/sections/plant for these articles.
@adamhsparks oops! it seems you found the right name! there is also Archives of Virology, but we should be OK with one representative of the field.
I agree with including MPMI.
I'm almost done with the list. I'll finish up this evening and make a commit with our assigned articles.
Closing this to clean up issues.
See: https://github.com/phytopathology/Reproducible.Plant.Pathology/blob/master/vignettes/reproducibility_criteria.Rmd for reproducibility criteria
See: https://github.com/phytopathology/Reproducible.Plant.Pathology/blob/master/vignettes/Assigning_Articles.Rmd for article assignments for each of us
Here is a start based on our previous discussions, but we should put this in an Rmd.
Paper attributes
Journal attributes