Closed gorgitko closed 2 years ago
btw. this PR is solving #132 and #188
Hi @gorgitko , thank you for putting this together. At the present I am going to pass on this PR. There appear to be numerous issues here. Perhaps the most major issue is that while I am aware of 'VEP' and 'SNPeff' I do not currently use them. When you make a PR you're not simply asking for your code to be incorporated into another project, you're asking the package maintainer to maintain this code. Because I do not use this data I really do not know anything about its format or contents. This makes me a poor candidate to maintain this.
You mentioned an issue with how white space is handled by you computer. This appears to be a substantial issue because the diff viewer flags these lines. This means that you PR appears to have many false positives in the diff viewer. This makes it very difficult for me to review this PR. You've also made your changes to the 'master' branch. This presents a challenge to reviewing this because I would consider it unadvisable to merge a PR from some stranger directly into the master/main branch. I have tried to clone your PR into a separate instance, but this has added to the work of trying to figure out which lines may actually be changes as well as me time in trying to figure out what the format of these 'annotations' may be. I feel I've made a good faith effort here and I don't see what you're trying to do, and every time I try I seem to find more issues. For example, why do we need 6 additional package dependencies just to make a table? I suspect there may be more here then I realize, but I'm feeling like every new door I open I discover three new doors, instead of the information I need.
Would it be more positive to take this conversation in a direction where your code could be used as the beginning of a new package? That way someone with a knowledge of the formats of these specifications can maintain it?
Hi @knausb, thanks for looking at this PR.
I understand your concerns about the future maintenance of my code. My motivation to introduce this feature was both my personal need to import such annotated VCFs and at the same time to help others as, regarding the two issues above, I am not alone. I thought it might be a nice addition to the current tidy data-related functions in vcfR
.
Just to clarify that unfortunate thing with changes in newlines, I am used to somehow standardize my code, and this is one of its manifestations (stripping of trailing horizontal whitespaces). I also like to use styleR
- "non-invasive pretty printing of R code". It makes collaboration much easier as everyone is using the same code style (which can be arbitrarily set by the package author/maintainer). Also, it can be also automatically run on new PRs.
Would it be more positive to take this conversation in a direction where your code could be used as the beginning of a new package? That way someone with a knowledge of the formats of these specifications can maintain it?
Yes, that would make sense. I would try to make a tiny package and let you know here when it's ready. Then I would be grateful if my package could be added to the vcfR
's README :slightly_smiling_face:
Hi @gorgitko , your perspectives on style here have prevented you from communicating the changes that you have attempted to propose. This is because you have defeated the purpose of the diff viewer, which is supposed to help reviewers find the changes you propose. From my perspective, I invested some time in trying to review your code. When I looked at the diff viewer I saw a message that told me that there were so many changes that they would not be rendered for some pages. I decided to take a deep breath and start from the top. The first change that I saw was one of these 'false positives' that appear to be created by your white space edits. I had to look at it for a bit because I did not see any changes (because it was white space changes). After looking at this a bit I realized the issue was that it was a white space change. This means no actionable item, so I moved on to the next, which was also white space edits. After looking at a few of these I realized that it was not worth my time to try and figure out how much of this PR was white space changes. After I invested some of my time trying to review this I still had no idea of what you were proposing. I never got to a single line of code that included a substantive change. I simply determined that I was not making any progress because there were so many white space edits that I could not find any real changes. I feel a goal when proposing changes to other projects is to communicate your changes as simply and clearly as possible. I never got to any real code changes because of all the white space edits and instead decided that it was not worth my time.
To my knowledge there is no minimal package size for CRAN, so you should not feel apologetic for this. If the package is too large it will throw a note during CRAN checks. vcfR currently throws this note, which is another reason I feel I need to be conservative about additions. Please let me know when you have a package and I will link it.
Good luck! Brian
Hi, Did the separate_ann_vep function make it to the vcfR package? How can we use it or which package did it end up in?
Hi @zahrahaider, you can install my fork using remotes::install_github("gorgitko/vcfR")
Hi,
this is an extension to the
vcfR2tidy()
function. It allows to separate annotation columns in tidy dataframes coming from VCFs annotated by snpEff or VEP tools (but it can generalize to any other annotation format). Annotations generated by these tools are following the Variant annotations in VCF format specification.Let's take a quick example of VEP-annotated VCF imported by
{vcfR}
(available as package dataset):Now in
vcf_tidy$meta
there is a INFO row with ID "CSQ" containing information about annotation fields:And the annotation data are stored in the
CSQ
column invcf_tidy$fix
:Now using the functions in the PR, we can separate these annotations:
vcf_tidy$meta
.I have included two new test datasets (
vcfR_test_snpEff
andvcfR_test_VEP
), written a proper (:crossed_fingers:) documentation of new functions, and included some basic unit tests. R CMD CHECK passed both locally and in the GHA workflow here.I hope this extension can serve well.
P.S. Sorry for some modifications in code formatting, but I have my RStudio configured to strip trailing horizontal whitespaces on save.