Bioinformatics修回 - Githubissues

ShixiangWang commented 3 years ago

期刊回应邮件原始内容：

Here are the Associate Editor's comments:

While the reviewers see utility in your tool, they raise a number of questions/concerns that need to be fully addressed before we can reach a decision. Reviewer 1 raises questions about the comparison to existing tools and about issues installing the tool, both if which are particularly important to address.

Here are the comments of the reviewers:

Reviewer: 1

Comments to the Author 1) General comments: In the paper titled “UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data”, Wang et al. present an R-package and an associated shiny application to interactively explore data from UCSC Xena – a large collection of publicly available cancer datasets. The authors discuss how their tools can help to perform analysis of different sub-cohorts of Xena and show that their tool returns publication-grade plots. The paper is in general concise and well-written. However, I see a few major points that need to be addressed before a potential publication.

2) Specific comments

a) Major points:

It is not quite clear to me, whether it is possible for users to also analyze their own dataset. This would make the tool even more useful, since many researchers want to relate their data and findings to published data. The authors could either mention that it is possible or discuss that this will be part of the next development stage of the tool.
The installation of the software package was quite challenging on my machine (conda on CentOS) with several error messages. For instance, “configure: error: GNU MP not found, or not 4.1.4 or up, see http://gmplib.org”. To solve the problem, I had to manually install GMP from here https://gmplib.org/ (I.e., using ./configure, make, make install). Similarly, I had to install “mpfr” using “conda install mpfr”. Still, the package ‘shinythemes’ was missing. After installing this using “conda install r-shinythemes”, the package was successfully loading, but app_run() returned the error message “'browser' must be a non-empty character string”, which I solved by setting ‘options(browser='firefox')’ in my R session. This was very tiresome and many users won’t use the software if the installation is so challenging. The authors should either provide extensive troubleshooting advice on GitHub or so or provide a docker or conda container or something similar.
Although the authors claim in the paper that the user interface is self-explanatory, I had a hard time to understand and get used to all the functionalities of the interface. The authors mention tutorial videos, but I could not find the respective link. This could make the interface much easier to use.
My feeling is that the cBioPortal available at https://www.cbioportal.org/ and published in two publications (https://pubmed.ncbi.nlm.nih.gov/22588877/, https://pubmed.ncbi.nlm.nih.gov/23550210/) provides a quite similar set of tools in comparison to UCSCXenaShiny. However, the authors do neither discuss nor cite this toolset. The authors should discuss the benefit of using their tool in comparison to cBioPortal, which also provides an API as well as associated R and python packages. I also found a similar, published tool CVCDAP (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439093/). It is my feeling that the literature research is not yet sufficient to position the tool within all the tools available. To that end, it would be very useful to have a table (along the lines of Supp Table S2, but substantially extended) listing similar tools and showing the advantages of UCSCXenaShiny.
Conclusions: The authors claim that the package has been downloaded more than 10,000 times without a reference to a website where this information is available. I just checked the download stats for May, which shows 532 downloads. Thus, the package would need to have been released for around 20 months and the authors do not discuss when the initial release was (R-4.0.0 was released in Apr 2020, which is not yet 20 months ago). The authors should further underpin their claim by either providing a correct link to the download statistics or showing the code how they extracted the information.

b) Minor points:

Page 3, line 23: can be served -> can serve
Page 4, line 17: Where do the ten thousand tumor samples come from? It is just a statement that is not underpinned by a reference or respective numbers from the projects. This should be added.
Page 4, line 33: Despite UCSC Xena itself provides -> Despite the fact that UCSC Xena provides a
Page 4, lines 39-42: to locate …., to download or to analyze
Page 4: “Besides, the analysis features provided by UCSC Xena platform are mainly designed for general analysis purpose”: This sentence is not clear. What is “general analysis purpose”? Please be more specific.
Page 4 line 52: What does “popular” mean? Can you underpin this by download statistics or by citations of the respective publication? Otherwise, this is a strong claim.
Page 4, line 60: , thus is -> . Thus, it is
Page 5, lines 8-10: for quickly search, retrieve, … -> for quickly searching, retrieving, ….
Figure: Methylation should be replaced by DNA methylation, since there are also other forms of methylation (e.g., histone methylation, which you are probably not referring to). In general, the figure contains a lot of text and could also be replace by a table. I would suggest to replace some of the text by small figure panels. E.g., the correlation could be visualized by a small example directly taken from the application. This would make the figure more visually appealing.
Page 5, lines 55-59: both the R package interface … and the Shiny application interface
Page 6, line 41: “Shiny interface is the major form of UCSCXenaShiny”: not clear what this sentence means.
Page 7, line 11 and 15: explore associations
Page 7, line 17: What do TMB and MSI stand for? The abbreviations have not been introduced.
Page 7, lines 3-35: I would think about the structure of this paragraph. It is quite hard to read in the current state. Could you change it into a listing and then re-formulate the phrases into proper sentences?
Page 7, Implementation paragraph: There are a lot of subjective adverbs (well organized, self-explanatory, properly, quickly). This is something that should be judged by users rather than by the developers, so I would delete all of them.
Page 7, line 60: The authors mention videos, but I don’t see a link in the GitHub. Perhaps it is hidden somewhere, but it was not easily finable, which should be improved.
Page 8, line 25: UCSCXenaShiny been -> UCSCXenaShiny has been
Page 8, line 31: promotes -> promote
Code: I had a brief look into the code on GitHub. I found that here (https://github.com/openbiox/UCSCXenaShiny/blob/master/R/analyze_gene_drug_response.R) the function pcor_test and related functions have been copied from another software repository (https://github.com/saezlab/Macau_project_1/blob/master/FUNCTIONS/partial_correlation_functions.R). The author repository is made available through GNU GPL-3 license. I am not quite sure whether this is allowed to further distribute it under MIT, but perhaps you can make sure with your legal department.

Reviewer: 2

Comments to the Author

In the manuscript entitled “UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data”, Wang et al., described a new R package which automatically retrieves data from Xena database and also provides tools for exploratory analysis on gene level as well on cohort level. The tool is very well designed and developed and I think it will be a useful tool for both computational biologists and clinical researchers for integrating public datasets to their studies. Following are my comments:

On the text

Although I am not a native English speaker, I still feel the text needs to be extensively improved. For example, in Abstract (also in paragraph 3), “for quick search, download, …” should be “ for quick searching, downloading, …”. Paragraph 2, “Enhanced functionalities for …”, here “Enhanced functionalities” is ambiguous. Authors need to make it clear and more specific. In paragraph3, authors mentioned “an open-source and popular R package”. “popular” is too subjective. How do you define “popular”?

Authors reviewed and compared current tools in the supplementary, why don’t they discuss it in the main text? I think this is the normal way to do it.

On the tool

In “Repository” tab, in the table which shows general information of the datasets, I feel some column names can be improved to make them easier to read, such as maybe “Data type” is more proper than “Subtype”? Also better names for “Label” and “Unit”? log2(copy-number/2) is not a “unit”, right?
In “Repository” tab, the left sidebar, the list of values for “Data Type” might be more proper to correspond to the “Subtype” column in the dataset table? Such as to use “Expression”/”methylation”, while not “Feature by sample matrix” which is not easy to understand, or consider to use a secondary list for these sub-categories.
When “Feature by sample matrix” is only selected, there is also “copy number” datasets. Shouldn’t they go to “Genomic segments” category?
In “General Analysis” tab, “Matrix-correlation” subtab, in the correlation heatmap, in the legend text below the heatmap, e.g. “p < 0.05” should be “adjusted p < 0.05”.
When I perform “General Analysis”, I repeatedly get the error message “Please make sure the two selected datasets are ‘genomicMatrix’ type”. This is hard to infer. Maybe a clearer message here?
In “Quick PanCan Analysis”, the left sidebar, I think it is more proper to put the “search” button which is now on the top right to the bottom of the sidebar, because normally users will select the paremeters, then go to the end of the side bar, click the “button” and wait for the results. Under current design, at the bottom of the sidebar, it is a “Download” button while not a “go and analyze” button, and without clicking the “search” button which is far away on the top, only an error html page is downloaded. I spend some time until I figured out the right order to perform.

ShixiangWang commented 3 years ago

预计半个月左右修回。目前我还在处理离校的事情，尽量今天先整理和归纳下问题，然后找时间大家一起讨论下和分工。

ShixiangWang commented 3 years ago

Bioinformatics问题投稿汇总

评审人提到的语法问题直接修改，不纳入交流讨论。

来自编辑

解决reviewer1关注的2个关键问题：

与已知工具的比较
安装工具的问题

来自评审人

是否用户可以分析自己的数据？

这个我们需要增加支持，我的想法是增加一个模块页面包含上传数据的控件以及相关说明，主要包含2个文件，一个是基因组学数据（custom_feature_data），一个是表型数据（可选，custom_phenotype）。这样与Xena提供的数据格式本身也是匹配的，server端我们只要给它们一个独立的名字想办法嵌入即可。（增加相应视频）

安装问题（CentOS系统），比较难搞。作者应该提供故障排除建议，或者类似docker/conda的容器。

这里我们先说明下Linux系统使用各种编程分析环境缺乏系统依赖，安装困难是普遍存在的。我们的包在常用的Win/Mac安装和使用都比较方便，因为系统依赖齐全，且R软件包有专门的二进制包。另外的确用用户在Linux下使用也发现一些安装问题，反馈了故障排除意见，我们单独在README进行了说明https://github.com/openbiox/UCSCXenaShiny#hammer_and_wrench-troubleshooting。为了更好地解决安装问题，我们部署了docker镜像：https://hub.docker.com/r/shixiangwang/ucscxenashiny。

没有觉得界面的自解释性强，以及找到相关视频。

前面部分我们再解释下，视频我们单独在README中罗列下。

cBioPortal相比于xenashiny提供了类似的工具集（(https://pubmed.ncbi.nlm.nih.gov/22588877/, https://pubmed.ncbi.nlm.nih.gov/23550210/) ）。但作者没有讨论/引用它们。作者需要讨论使用他们的工具与cBioPortal的优点，它也有相关的R/Python API，例如CVCDAP (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439093/。我的感觉是，文献研究还不足以在所有可用的工具中定位该工具。为此，有一个表(类似于Supp table S2，但大大扩展了)列出类似的工具并展示UCSCXenaShiny的优点是非常有用的。

这里需要提一下xenashiny和cBioPortal的关系是类似的，也不是属于一个水平上比较的东西。

文献数据 -> cBioPortal -> cBioPortal 的 API 工具

公开数据库/文献数据 -> UCSC Xena -> XenaShiny/UCSCXenaTools -> UCSCXenaShiny

与cBioPortal比较并不是我们研究的重点，我们的重点着重在高效率利用/展示UCSC Xena提供的数据。

上述的关系，可以加一个图形阐释一下（作为附图1）。

该包下载量数据来源。我们画一个图，并注明下数据来源。https://cranlogs.r-pkg.org/
I had a brief look into the code on GitHub. I found that here (https://github.com/openbiox/UCSCXenaShiny/blob/master/R/analyze_gene_drug_response.R) the function pcor_test and related functions have been copied from another software repository (https://github.com/saezlab/Macau_project_1/blob/master/FUNCTIONS/partial_correlation_functions.R). The author repository is made available through GNU GPL-3 license. I am not quite sure whether this is allowed to further distribute it under MIT, but perhaps you can make sure with your legal department.

在该文件头部添加 GPL 许可协议，说明该文件follow同样协议，整体我们保持不变。

ShixiangWang commented 3 years ago

Reviewer2的细节问题单独开了个issue方便进行管理开发解决 https://github.com/openbiox/UCSCXenaShiny/issues/214

ShixiangWang commented 3 years ago

GPL 协议也发展出了很多分支，其中 GPL v3 是最为激进的，基本上跟原始代码沾点边的代码就必须也得是 GPL 的，例如，最极端的，如果我的代码调用了 GPL 的库，那么我的代码就必须是 GPL 的。这基本意味着如果我是一个商业软件系统，那么我就没有权利使用 GPL v3 的代码了。v3 的背后是 GPL 之父 Richard Stallman 不断在宣传推进，代表了开源激进派的最前沿。而其他的 GPL 版本可以说都是略微温和版的 GPL ，例如 Linux 项目的 GPL v2 ，也是 Richard 自己写的，由于发布的早，所以很多问题他没有考虑都，所以让商业运用有了一些空间。另外还有 GNU Lesser General Public License ，GNU较宽松通用公共许可证，看名字就知道是比较温和的了。

开源协议需要统一修改下。

ShixiangWang commented 3 years ago

包许可协议已更正为 GPLv3。

Byronxy commented 3 years ago

从头安装包的时候shinythemes依赖没有自动安装，需要补充一下

ShixiangWang commented 3 years ago

从头安装包的时候shinythemes依赖没有自动安装，需要补充一下

https://github.com/openbiox/UCSCXenaShiny/blob/679babf980bf21fa9fa7cb609fe47ce1ee140625/inst/shinyapp/App.R#L81 已经改过了。

ShixiangWang commented 3 years ago

我博士导师前几天已经给了我一个初始的版本，接下来2-3天左右我基于我们目前的修改内容完善rebuttal，后续我们再一起修改完善。

openbiox / UCSCXenaShiny

Bioinformatics修回 #210

On the text

On the tool

来自编辑

来自评审人