openplantpathology / Reproducibility_in_Plant_Pathology

A systematic/quantitative review of articles, which provides a basis for identifying what has been done so far in the field of plant pathology research reproducibility and suggestions for ways to improving it.
https://openplantpathology.github.io/Reproducibility_in_Plant_Pathology
Other
23 stars 6 forks source link

Decide on attributes of papers to recorded #5

Closed zachary-foster closed 7 years ago

zachary-foster commented 7 years ago

Here is a start based on our previous discussions, but we should put this in an Rmd.

Paper attributes

Journal attributes

emdelponte commented 7 years ago

@zachary-foster and @adamhsparks

I suggest that we focus only on a selected set of journals (our expert judgment or asking others to review would suffice), which are the primary choice for most plant pathologists - recall that we will be submitting this potential manuscript to the leading plant pathology journal. I made a quick list below based on Adam's previous selection.

Given that most are applied, I suggest a different categorization as well. If you agree, let's check if these categories are correct and each of us could pick a set of around seven journals to scrutinize.

Based on my experience, focusing on raw data accessibility and computational methods, most will fall in a "not reproducible" category.. and if that is true the work will be quite quick so that we could increase the number of articles per journal up to, say, 20? hence, 400 articles! let's randomly select 100 articles (Adam's code) per journal and then we decided later on where to stop recording in order to be consistent.

Journal name Scope Research aspect
Australasian Plant Pathology Broad Applied
Canadian Journal of Plant Pathology Broad Applied
Crop Protection Broad Applied
European Journal of Plant Pathology Broad Applied
Forest Pathology Specialized Applied
Journal of General Plant Pathology Broad Applied
Journal of Phytopathology Broad Applied
Journal of Plant Pathology Broad Applied
Journal of Plant Virology Specialized Applied
Molecular Plant Pathology Broad Fundamental
Nematology Specialized Fundamental/Applied
Physiological and Molecular Plant Pathology Broad Molecular
Phytoparasitica Broad Applied
Phytopathologia Mediterranea Broad Applied
Phytopathology Broad Fundamental/Applied
Plant Disease Broad Applied
Plant Health Progress Broad Applied
Plant Pathology Broad Fundamental/Applied
PLoSONE Broad Fundamental/Applied
Revista Mexicana de Fitopatología Broad Applied
Tropical Plant Pathology Broad Applied
zachary-foster commented 7 years ago

@emdelponte

I suggest that we focus only on a selected set of journals

Im fine with that, as long as the articles are selected randomly.

Given that most are applied, I suggest a different categorization as well.

Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).

each of us could pick a set of around seven journals to scrutinize

Do you mean scrutinize the journals in order to determine journal attributes or that we each get seven journals to read papers from? If the former, I agree. If the latter, I think we should randomly pick who reads what article independent of the journal so that the reader is not a confounding factor with journal.

let's randomly select 100 articles (Adam's code) per journal and then we decided later on where to stop recording in order to be consistent.

I like that plan. We should make "goals" that we all have to meet before anyone reads more papers. That way no one does more work than needed. For example, we can start with 20 articles each and once everyone has read 20 articles we increase the goal to 30 and so on until we get tired of it. An issue for each goal would work well to keep track of progress.

emdelponte commented 7 years ago

@zachary-foster @adamhsparks

Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).

I liked this simplification and categorization at the article rather than the journal level - Both types will definitely be found in a same journal. Both levels can be used. We will have a better sense of the categories at the article level during the work. This could include pathogen description, population biology, epidemiology, management, etc. so that we could identify which kind of study authors are more prone to make it more reproducible? anyway, the simpler the better but let's see how it goes!

Do you mean scrutinize the journals in order to determine journal attributes or that we each get seven journals to read papers from? If the former, I agree. If the latter, I think we should randomly pick who reads what article independent of the journal so that the reader is not a confounding factor with journal.

Yes, I meant the scrutiny only after randomly selecting them. For journals like PLoS and Crop Protection we also need to define if we will skip those articles that are not plant-pathology related, which will be the most common case. The same for specialized journals such as Nematology, etc. when no plant pathogen/disease is involved.

adamhsparks commented 7 years ago

Yea. Perhaps we should be categorizing individual articles as e.g. "applied" vs "molecular", instead of the journal. Many journals would accept both types of articles (e.g. PlosONE).

This was my original intent

adamhsparks commented 7 years ago

@zachary-foster @emdelponte @grunwald I'm finally getting back around to this and will dedicate some time this week to this work. From what I read here, you guys have captured my original ideas and clarified them much better than I had managed to.

My take on this is that we need to decide on our journals we're sampling from, the list @emdelponte gave above is a good start. If we drop one from the list, we have twenty, that leaves each of us with five journals that we can select our articles at random from (four replicates of five journals each, if you will).

Then each of us can use my code to randomly select the articles from our respective set of five journals.

I'm happy to leave it to each of our discretion how to categorise the articles. I suspect we'll have some that fall into two categories or maybe even three.

@zachary-foster's paper attributes good for me. I do agree with @emdelponte, that if we define reproducible as having raw data and computer code available, etc. that most everything will fail. But, we can make that the gold standard and see if we can find anything that reaches that level. This means that we should have some basic categories for reproducibility as well.

emdelponte commented 7 years ago

@zachary-foster @adamhsparks @grunwald

I liked these categories. How do you visualize the data frame? an article per row and how about the columns? and values to assign? binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible). Could the final category be decided based on median score?

I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology.

For Crop Protection, one should skip articles that do not deal with plant disease.

adamhsparks commented 7 years ago

@emdelponte, I think that's a reasonable idea, we can assign a value to the categories as you've suggested. I've updated my previous comment with scores based on 0-3 as you've suggested. A gold score would be 6 in this case and silver would be 4, but there might be 5s depending on data vs computational methods, etc.

I second removing PLoS from the list and agree with skipping non-plant pathology articles in Crop Protection.

Here's how I'd envision the data frame structure.

reproducibility <- tibble::tibble(
  Article = "The Area Under the Disease Progress Stairs: Calculation, Advantage, and Application",
  DOI = "PHYTO-07-11-0216",
  Journal = "Phytopathology",
  Authors =  "Ivan Simko and Hans-Peter Piepho",
  Year = 2012,
  Vol = 102,
  Iss = 4,
  pp = "381-389",
  IF = 3.011,
  Journal_class = "Fundamental",
  Page_charges = 130,
  Country =  "USA",
  Open_or_Restricted = "Optional",
  Reproducibility_instructions = FALSE,
  Iss_per_Year = 12,
  Supl_mats = TRUE,
  Comp_methods_availability = 2,
  Software_availability = 1,
  Software_citation = 3,
  Analysis_automation = 0,
  Data_availability = 0,
  Data_annotation = 0,
  Data_tidiness = 0
)

reproducibility <- dplyr::mutate(reproducibility,
                                 Reproducibility_score = sum(Comp_methods_availability,
                                                             Software_availability,
                                                             Software_citation,
                                                             Analysis_automation,
                                                             Data_availability,
                                                             Data_annotation,
                                                             Data_tidiness))
zachary-foster commented 7 years ago

I have adapted the paper attributes with the 4-score system and put it in an Rmd so we can refine it better. Check it out here:

https://github.com/adamhsparks/Reproducible-Research-in-Plant-Pathology/blob/master/reproducibility_criteria.Rmd

@adamhsparks

If we drop one from the list, we have twenty, that leaves each of us with five journals that we can select our articles at random from (four replicates of five journals each, if you will).

I think we should select the articles from all the journals first and then split them up randomly, independent of journal, otherwise the person reading will be a confounding factor with journal.

if we define reproducible as having raw data and computer code available, etc. that most everything will fail. But, we can make that the gold standard and see if we can find anything that reaches that level.

Yes, I would expect very few projects to be entirely reproducible. However, with the 0-3 scoring system, 2 (silver) would still be relatively good.

@emdelponte

binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible).

I like the idea of a 0-3 scoring with 3 being exceptional and 1-2 being typical.

I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology. For Crop Protection, one should skip articles that do not deal with plant disease.

Agreed.

adamhsparks commented 7 years ago

I think we should select the articles from all the journals first and then split them up randomly, independent of journal, otherwise the person reading will be a confounding factor with journal.

Agreed

I've edited the Rmd file to include the example tibble with @zachary-foster's updated suggestions for reproducibility categories

https://github.com/adamhsparks/Reproducible-Research-in-Plant-Pathology/blob/master/reproducibility_criteria.Rmd

Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.

grunwald commented 7 years ago

While SAS is finally free, one cannot reproduce publication ready graphs in SAS. So SAS should get a lower score. Also, code is not open source and cannot be improved by SAS user community.

grunwald commented 7 years ago

I vote for removing PLoS from the list - the only not explicitly related to Plant Pathology. For Crop Protection, one should skip articles that do not deal with plant disease.

I would include PLOS indirectly by selecting 'plant pathology' articles randomly?

grunwald commented 7 years ago

binary or ordered scores (e.g. for computational method: 0 - no script; 1 - login needed; 2 - public accessible). I like the idea of a 0-3 scoring with 3 being exceptional and 1-2 being typical.

I like the 0 (no code) to 3 (fully Open Source code)

zachary-foster commented 7 years ago

Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.

I think a score of 2 sounds about right. Its easily available, but proprietary. Not being open source makes it less reproducible even if it is free (e.g. you don’t know how a change in version would affect results or the details of how algorithms are implemented).

grunwald commented 7 years ago

Where do we now categorise SAS? There is a free University edition for download or use with AWS Cloud. Having looked at it, I think it might now fall into a 2 rating. You have to sign up, login, etc., so it's free but it's still proprietary.

Also keep in mind that SAS is not OA code. Also, cannot produce graphs for publication. Free not equal to Open.

adamhsparks commented 7 years ago

Yes, free != open, that's why I raised the issue.

@grunwald is right, I'd forgotten that you can't make graphs for publication. That is part of being reproducible. Maybe that does warrant a lower ranking? I don't want to be seen as saying "SAS is bad" to use, but...

adamhsparks commented 7 years ago

It seems that we are happy with this scale, let's move forward using this and see how we go, also, I don't have any other journal attribute suggestions to make either, so we can record those as well.

adamhsparks commented 7 years ago

As I'm going through this list that @emdelponte proposed, MPMI isn't listed here.

I think I need to rerun our list and add MPMI in, that's a pretty big journal to omit.

adamhsparks commented 7 years ago

@emdelponte, I am not finding a "Journal of Plant Virology" as you have suggested.

I've elected to go with http://www.virologyj.com/sections/plant for these articles.

emdelponte commented 7 years ago

@adamhsparks oops! it seems you found the right name! there is also Archives of Virology, but we should be OK with one representative of the field.

I agree with including MPMI.

adamhsparks commented 7 years ago

I'm almost done with the list. I'll finish up this evening and make a commit with our assigned articles.

adamhsparks commented 7 years ago

Closing this to clean up issues.

See: https://github.com/phytopathology/Reproducible.Plant.Pathology/blob/master/vignettes/reproducibility_criteria.Rmd for reproducibility criteria

See: https://github.com/phytopathology/Reproducible.Plant.Pathology/blob/master/vignettes/Assigning_Articles.Rmd for article assignments for each of us