Closed ShixiangWang closed 3 years ago
预计半个月左右修回。目前我还在处理离校的事情,尽量今天先整理和归纳下问题,然后找时间大家一起讨论下和分工。
Bioinformatics问题投稿汇总
评审人提到的语法问题直接修改,不纳入交流讨论。
解决reviewer1关注的2个关键问题:
这个我们需要增加支持,我的想法是增加一个模块页面包含上传数据的控件以及相关说明,主要包含2个文件,一个是基因组学数据(custom_feature_data),一个是表型数据(可选,custom_phenotype)。这样与Xena提供的数据格式本身也是匹配的,server端我们只要给它们一个独立的名字想办法嵌入即可。(增加相应视频)
这里我们先说明下Linux系统使用各种编程分析环境缺乏系统依赖,安装困难是普遍存在的。我们的包在常用的Win/Mac安装和使用都比较方便,因为系统依赖齐全,且R软件包有专门的二进制包。另外的确用用户在Linux下使用也发现一些安装问题,反馈了故障排除意见,我们单独在README进行了说明https://github.com/openbiox/UCSCXenaShiny#hammer_and_wrench-troubleshooting。为了更好地解决安装问题,我们部署了docker镜像:https://hub.docker.com/r/shixiangwang/ucscxenashiny。
前面部分我们再解释下,视频我们单独在README中罗列下。
这里需要提一下xenashiny和cBioPortal的关系是类似的,也不是属于一个水平上比较的东西。
文献数据 -> cBioPortal -> cBioPortal 的 API 工具
公开数据库/文献数据 -> UCSC Xena -> XenaShiny/UCSCXenaTools -> UCSCXenaShiny
与cBioPortal比较并不是我们研究的重点,我们的重点着重在高效率利用/展示UCSC Xena提供的数据。
上述的关系,可以加一个图形阐释一下(作为附图1)。
该包下载量数据来源。 我们画一个图,并注明下数据来源。https://cranlogs.r-pkg.org/
I had a brief look into the code on GitHub. I found that here (https://github.com/openbiox/UCSCXenaShiny/blob/master/R/analyze_gene_drug_response.R) the function pcor_test and related functions have been copied from another software repository (https://github.com/saezlab/Macau_project_1/blob/master/FUNCTIONS/partial_correlation_functions.R). The author repository is made available through GNU GPL-3 license. I am not quite sure whether this is allowed to further distribute it under MIT, but perhaps you can make sure with your legal department.
在该文件头部添加 GPL 许可协议,说明该文件follow同样协议,整体我们保持不变。
Reviewer2的细节问题单独开了个issue方便进行管理开发解决 https://github.com/openbiox/UCSCXenaShiny/issues/214
GPL 协议也发展出了很多分支,其中 GPL v3 是最为激进的,基本上跟原始代码沾点边的代码就必须也得是 GPL 的,例如,最极端的,如果我的代码调用了 GPL 的库,那么我的代码就必须是 GPL 的。这基本意味着如果我是一个商业软件系统,那么我就没有权利使用 GPL v3 的代码了。v3 的背后是 GPL 之父 Richard Stallman 不断在宣传推进,代表了开源激进派的最前沿。而其他的 GPL 版本可以说都是略微温和版的 GPL ,例如 Linux 项目的 GPL v2 ,也是 Richard 自己写的,由于发布的早,所以很多问题他没有考虑都,所以让商业运用有了一些空间。另外还有 GNU Lesser General Public License ,GNU较宽松通用公共许可证 ,看名字就知道是比较温和的了。
开源协议需要统一修改下。
包许可协议已更正为 GPLv3。
从头安装包的时候shinythemes
依赖没有自动安装,需要补充一下
从头安装包的时候
shinythemes
依赖没有自动安装,需要补充一下
我博士导师前几天已经给了我一个初始的版本,接下来2-3天左右我基于我们目前的修改内容完善rebuttal,后续我们再一起修改完善。
期刊回应邮件原始内容:
Here are the Associate Editor's comments:
While the reviewers see utility in your tool, they raise a number of questions/concerns that need to be fully addressed before we can reach a decision. Reviewer 1 raises questions about the comparison to existing tools and about issues installing the tool, both if which are particularly important to address.
Here are the comments of the reviewers:
Reviewer: 1
Comments to the Author 1) General comments: In the paper titled “UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data”, Wang et al. present an R-package and an associated shiny application to interactively explore data from UCSC Xena – a large collection of publicly available cancer datasets. The authors discuss how their tools can help to perform analysis of different sub-cohorts of Xena and show that their tool returns publication-grade plots. The paper is in general concise and well-written. However, I see a few major points that need to be addressed before a potential publication.
2) Specific comments
a) Major points:
It is not quite clear to me, whether it is possible for users to also analyze their own dataset. This would make the tool even more useful, since many researchers want to relate their data and findings to published data. The authors could either mention that it is possible or discuss that this will be part of the next development stage of the tool.
The installation of the software package was quite challenging on my machine (conda on CentOS) with several error messages. For instance, “configure: error: GNU MP not found, or not 4.1.4 or up, see http://gmplib.org”. To solve the problem, I had to manually install GMP from here https://gmplib.org/ (I.e., using ./configure, make, make install). Similarly, I had to install “mpfr” using “conda install mpfr”. Still, the package ‘shinythemes’ was missing. After installing this using “conda install r-shinythemes”, the package was successfully loading, but app_run() returned the error message “'browser' must be a non-empty character string”, which I solved by setting ‘options(browser='firefox')’ in my R session. This was very tiresome and many users won’t use the software if the installation is so challenging. The authors should either provide extensive troubleshooting advice on GitHub or so or provide a docker or conda container or something similar.
Although the authors claim in the paper that the user interface is self-explanatory, I had a hard time to understand and get used to all the functionalities of the interface. The authors mention tutorial videos, but I could not find the respective link. This could make the interface much easier to use.
My feeling is that the cBioPortal available at https://www.cbioportal.org/ and published in two publications (https://pubmed.ncbi.nlm.nih.gov/22588877/, https://pubmed.ncbi.nlm.nih.gov/23550210/) provides a quite similar set of tools in comparison to UCSCXenaShiny. However, the authors do neither discuss nor cite this toolset. The authors should discuss the benefit of using their tool in comparison to cBioPortal, which also provides an API as well as associated R and python packages. I also found a similar, published tool CVCDAP (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439093/). It is my feeling that the literature research is not yet sufficient to position the tool within all the tools available. To that end, it would be very useful to have a table (along the lines of Supp Table S2, but substantially extended) listing similar tools and showing the advantages of UCSCXenaShiny.
Conclusions: The authors claim that the package has been downloaded more than 10,000 times without a reference to a website where this information is available. I just checked the download stats for May, which shows 532 downloads. Thus, the package would need to have been released for around 20 months and the authors do not discuss when the initial release was (R-4.0.0 was released in Apr 2020, which is not yet 20 months ago). The authors should further underpin their claim by either providing a correct link to the download statistics or showing the code how they extracted the information.
b) Minor points:
Page 3, line 23: can be served -> can serve
Page 4, line 17: Where do the ten thousand tumor samples come from? It is just a statement that is not underpinned by a reference or respective numbers from the projects. This should be added.
Page 4, line 33: Despite UCSC Xena itself provides -> Despite the fact that UCSC Xena provides a
Page 4, lines 39-42: to locate …., to download or to analyze
Page 4: “Besides, the analysis features provided by UCSC Xena platform are mainly designed for general analysis purpose”: This sentence is not clear. What is “general analysis purpose”? Please be more specific.
Page 4 line 52: What does “popular” mean? Can you underpin this by download statistics or by citations of the respective publication? Otherwise, this is a strong claim.
Page 4, line 60: , thus is -> . Thus, it is
Page 5, lines 8-10: for quickly search, retrieve, … -> for quickly searching, retrieving, ….
Figure: Methylation should be replaced by DNA methylation, since there are also other forms of methylation (e.g., histone methylation, which you are probably not referring to). In general, the figure contains a lot of text and could also be replace by a table. I would suggest to replace some of the text by small figure panels. E.g., the correlation could be visualized by a small example directly taken from the application. This would make the figure more visually appealing.
Page 5, lines 55-59: both the R package interface … and the Shiny application interface
Page 6, line 41: “Shiny interface is the major form of UCSCXenaShiny”: not clear what this sentence means.
Page 7, line 11 and 15: explore associations
Page 7, line 17: What do TMB and MSI stand for? The abbreviations have not been introduced.
Page 7, lines 3-35: I would think about the structure of this paragraph. It is quite hard to read in the current state. Could you change it into a listing and then re-formulate the phrases into proper sentences?
Page 7, Implementation paragraph: There are a lot of subjective adverbs (well organized, self-explanatory, properly, quickly). This is something that should be judged by users rather than by the developers, so I would delete all of them.
Page 7, line 60: The authors mention videos, but I don’t see a link in the GitHub. Perhaps it is hidden somewhere, but it was not easily finable, which should be improved.
Page 8, line 25: UCSCXenaShiny been -> UCSCXenaShiny has been
Page 8, line 31: promotes -> promote
Code: I had a brief look into the code on GitHub. I found that here (https://github.com/openbiox/UCSCXenaShiny/blob/master/R/analyze_gene_drug_response.R) the function pcor_test and related functions have been copied from another software repository (https://github.com/saezlab/Macau_project_1/blob/master/FUNCTIONS/partial_correlation_functions.R). The author repository is made available through GNU GPL-3 license. I am not quite sure whether this is allowed to further distribute it under MIT, but perhaps you can make sure with your legal department.
Reviewer: 2
Comments to the Author
In the manuscript entitled “UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data”, Wang et al., described a new R package which automatically retrieves data from Xena database and also provides tools for exploratory analysis on gene level as well on cohort level. The tool is very well designed and developed and I think it will be a useful tool for both computational biologists and clinical researchers for integrating public datasets to their studies. Following are my comments:
On the text
Although I am not a native English speaker, I still feel the text needs to be extensively improved. For example, in Abstract (also in paragraph 3), “for quick search, download, …” should be “ for quick searching, downloading, …”. Paragraph 2, “Enhanced functionalities for …”, here “Enhanced functionalities” is ambiguous. Authors need to make it clear and more specific. In paragraph3, authors mentioned “an open-source and popular R package”. “popular” is too subjective. How do you define “popular”?
Authors reviewed and compared current tools in the supplementary, why don’t they discuss it in the main text? I think this is the normal way to do it.
On the tool
In “Quick PanCan Analysis”, the left sidebar, I think it is more proper to put the “search” button which is now on the top right to the bottom of the sidebar, because normally users will select the paremeters, then go to the end of the side bar, click the “button” and wait for the results. Under current design, at the bottom of the sidebar, it is a “Download” button while not a “go and analyze” button, and without clicking the “search” button which is far away on the top, only an error html page is downloaded. I spend some time until I figured out the right order to perform.