oneillkza / ContiBAIT

BSD 2-Clause "Simplified" License
0 stars 3 forks source link

Issues to address before resubmission to Bioconductor #24

Closed rareaquaticbadger closed 8 years ago

rareaquaticbadger commented 8 years ago

Thought it would be good to take stock of what needs to be done. Not sure if they should be separate issues or a list, but here goes:

  1. Inability to build vignette: possibly changes to R caused this (reviewer says: "This is due to a change in R related to the lines") STATUS- OPEN
  2. Import methods in the DESCRIPTION and import(methods) in the NAMESPACE STATUS- CLOSED
  3. Avoid false positive notes using aes_string etc. STATUS- CLOSED: _changed all ggplots functions to aesstring. Still notes related to GRanges (rname, & pos from strandSeqFreqTable), but unable to call these as strings so no alternative. linkage.groups2 is removed from contiBAIT.R so error should no longer occur.
  4. After biocLite("BiocCheck"), R CMD BiocCheck contiBAIT_0.99.0.tar.gz suggests import'ing grDevices::pdf, grDevices::dev.off, stats::runif. STATUS- OPEN
  5. Create 'constructors' rather than invoking new() directly. STATUS- OPEN
  6. he ChrTable class looks like a GRanges(); why not re-use it, perhaps as a simple extension setClass("ChrTable", contains="GRanges")? STATUS- MOSTLY CLOSED. Keiran, should I keep this as a GRanges, or make a new class that incorporates GRanges? Not sure what the best approach is...
  7. When using setGeneric() it's usually very desirable to limit dispatch to the arguments for which methods are defined with, e.g., signature="strandTable". This avoids a combinatorial proliferation of implied methods. STATUS- CLOSED
  8. some of the lines of code are very long and highly nested. STATUS- ONGOING. Changed the example he gave, but still working on simplifying other lines. Will finish by the end of the day.
  9. Use seq_len(N) or seq_along(x) instead of 1:N or 1:length(x), to avoid problems when N or length(x) == 0. STATUS- OPEN.m Hope to finish this by the end of the day
  10. There is not much value in nested functions like computeBarPlotMatrix and calcOneGroupChr in barplotLinkageGroupCalls.func. STATUS: OPEN. Probably because they are only called once now, they can be plopped into the body of the code. Will attempt to finish this today unless you have a different interpretation of what reviewer meant.
  11. The 'copy and append' strategy (from clusterContig.R) is expensive. Make list first. STATUS: OPEN. Kieran?
  12. mergeLinkageGroups.func has default value clusNum = 1, so makeCluster(clusNum) makes a cluster of size 1. STATUS- CLOSED
  13. Consider using matrix() instead of `data.frame()' STATUS- OPEN
  14. use BiocParallel (bplapply() rather than parLapply(cl, ) STATUS- OPEN
oneillkza commented 8 years ago

Happy to take on 5, 11 and 14.

Re 6: I'd be inclined to do what the reviewer said -- setClass("ChrTable", contains="GRanges"). The only trick would be ensuring that it stays a ChrTable object whenever it's returned from methods (since, unless we define the various subset operators, they'll turn it back into a GRanges).

Re: 10: This is my reading as well. I think the only reason we had those as separate functions was because they were used more than once, but if they are only called from one place, and we're not exporting them, then there's no reason not to just roll them into the main function.

I can possibly have a go at 13; on the one hand it's pretty simple to just replace data.frame with matrix, but on the other hand it may break things all over the place.

rareaquaticbadger commented 8 years ago

UPDATE from latest pull request. (REMOVED CLOSED ISSUES FROM PREVIOUS MESSAGE)

  1. Inability to build vignette: possibly changes to R caused this (reviewer says: "This is due to a change in R related to the lines") STATUS- OPEN
  2. Import methods in the DESCRIPTION and import(methods) in the NAMESPACE STATUS- CLOSED
  3. After biocLite("BiocCheck"), R CMD BiocCheck contiBAIT_0.99.0.tar.gz suggests import'ing grDevices::pdf, grDevices::dev.off, stats::runif. STATUS- OPEN
  4. Create 'constructors' rather than invoking new() directly. STATUS- OPEN KIERAN TO DO
  5. The ChrTable class looks like a GRanges(); why not re-use it, perhaps as a simple extension setClass("ChrTable", contains="GRanges")? STATUS- OPEN. Will do once constructors (item 5) complete
  6. some of the lines of code are very long and highly nested. STATUS- CLOSED. See latest pull request.
  7. Use seq_len(N) or seq_along(x) instead of 1:N or 1:length(x), to avoid problems when N or length(x) == 0. STATUS- CLOSED. See latest pull request
  8. There is not much value in nested functions like computeBarPlotMatrix and calcOneGroupChr in barplotLinkageGroupCalls.func. STATUS: MOSTLY CLOSED. See latest pull request: Merged in computeBarPlotMatrix, but calcOneGroupChr is called by sapply, so not sure how to integrate that. Is it ok to leave as is?
  9. The 'copy and append' strategy (from clusterContig.R) is expensive. Make list first. STATUS: OPEN. KIERAN TO DO
  10. Consider using matrix() instead of `data.frame()' STATUS- OPEN
  11. use BiocParallel (bplapply() rather than parLapply(cl, ) STATUS- OPEN KIERAN TO DO
rareaquaticbadger commented 8 years ago

Hi Kieran,

Great that you're taking on the constructrs, copy&append of clusterContig list and using biocParallel.

I'm going to try and fix plotIdeogram, then tackle remaking a ChrTable object, and if I have time I'll play around with matrix() forms of our current data.frames. I agree that the behaviour of a matrix may screw up downstream code, but it's probably worth a check. Who knows, R might surprise us and it'll work first time for a change!

I plan to create pull requests for each, so hopefully you won't be inundated. The only potential conflict might be the ChrTable object while you're working on constructors, but I think it'll be fine.

That only leaves the import of grDevices issue. Any ideas about that?

oneillkza commented 8 years ago

Hmmm ... just to warn you, I'm almost done with the BiocParallel thing (11), and it is requiring changes in a lot of files (including NAMESPACE and DESCRIPTION), so may require some merging.

It will also definitely break your analysis code, since I've standardised mergeContigs, clusterContigs and contiBAIT to all accept a parameter called clusterParam, of type BiocParallelParam. This is actually more powerful and allows for logging, so is better in the long run.

rareaquaticbadger commented 8 years ago

No problem. I can handle it.

I'm having issues with ChrTable classes at the moment though. I'm having the opposite problem of what you said. I made a class as suggested (setClass("ChrTable", contains="GRanges")), but now every time something happens in GRanges (for example chrTable <- disjoin(chrTable), or append(chrTable, segs)) I get an invalid class error, because GRanges wants an object of class GRanges, not ChrTable. Any fixes, or shall I just abandon making this new class?

I can't see where the build bug with ideogramPlot is happening. I've removed a line which I think is redundant, but other than that I don't see why it's not building in the reviewers hands. I'm running "Wooden Christmas Tree", which is the latest version of R. I might not have updated all my dependencies though. Will try that and see if I can recreate the error.

oneillkza commented 8 years ago

UPDATE from latest commits. (REMOVED CLOSED ISSUES FROM PREVIOUS MESSAGE)

  1. Inability to build vignette: possibly changes to R caused this (reviewer says: "This is due to a change in R related to the lines") STATUS- OPEN
  2. After biocLite("BiocCheck"), R CMD BiocCheck contiBAIT_0.99.0.tar.gz suggests import'ing grDevices::pdf, grDevices::dev.off, stats::runif. STATUS- CLOSED
  3. Create 'constructors' rather than invoking new() directly. STATUS- OPEN KIERAN TO DO
  4. The ChrTable class looks like a GRanges(); why not re-use it, perhaps as a simple extension setClass("ChrTable", contains="GRanges")? STATUS- OPEN. Will do once constructors (item 5) complete
  5. There is not much value in nested functions like computeBarPlotMatrix and calcOneGroupChr in barplotLinkageGroupCalls.func. STATUS: MOSTLY CLOSED. See latest pull request: Merged in computeBarPlotMatrix, but calcOneGroupChr is called by sapply, so not sure how to integrate that. Is it ok to leave as is?
  6. The 'copy and append' strategy (from clusterContig.R) is expensive. Make list first. STATUS: OPEN. KIERAN TO DO
  7. Consider using matrix() instead of `data.frame()' STATUS- OPEN
  8. use BiocParallel (bplapply() rather than parLapply(cl, ) STATUS- CLOSED
oneillkza commented 8 years ago

Hmm ... I started working on constructors, then realised I should wait until you've converted the classes to using matrix(), since that impacts on all the constructor (and class) code.

oneillkza commented 8 years ago

Or, rather, I can do all of the ones except the two data.frame() based classes that need to change to matrices (StrandStateMatrix and RawReadStrands)

oneillkza commented 8 years ago

UPDATE from latest commits. (REMOVED CLOSED ISSUES FROM PREVIOUS MESSAGE)

  1. Inability to build vignette: possibly changes to R caused this (reviewer says: "This is due to a change in R related to the lines") STATUS- OPEN
  2. Create 'constructors' rather than invoking new() directly. STATUS- HALF CLOSED done for non-data.frame classes, waiting on 13 for remainder
  3. The ChrTable class looks like a GRanges(); why not re-use it, perhaps as a simple extension setClass("ChrTable", contains="GRanges")? STATUS- OPEN. Will do once constructors (item 5) complete
  4. There is not much value in nested functions like computeBarPlotMatrix and calcOneGroupChr in barplotLinkageGroupCalls.func. STATUS: MOSTLY CLOSED. See latest pull request: Merged in computeBarPlotMatrix, but calcOneGroupChr is called by sapply, so not sure how to integrate that. Is it ok to leave as is?
  5. The 'copy and append' strategy (from clusterContig.R) is expensive. Make list first. STATUS: OPEN. KIERAN TO DO
  6. Consider using matrix() instead of `data.frame()' STATUS- OPEN
rareaquaticbadger commented 8 years ago

UPDATE from latest commits (please see pull request). (REMOVED CLOSED ISSUES FROM PREVIOUS MESSAGE)

  1. Inability to build vignette: possibly changes to R caused this (reviewer says: "This is due to a change in R related to the lines") STATUS- CLOSED managed to reproduce the error. table() was occassionally carrying over its names attribute. added as.vector() and seemed to fix.
  2. Create 'constructors' rather than invoking new() directly. STATUS- HALF CLOSED done for non-data.frame classes, waiting on 13 for remainder.
  3. The ChrTable class looks like a GRanges(); why not re-use it, perhaps as a simple extension setClass("ChrTable", contains="GRanges")? STATUS- CLOSED. ChrTable is now a GRanges object
  4. The 'copy and append' strategy (from clusterContig.R) is expensive. Make list first. STATUS: OPEN. KIERAN TO DO
  5. Consider using matrix() instead of `data.frame()' STATUS- CLOSED. All data.frames are now matrices. Had a few issues here. One was with computeConsensus, which used table(), that performed strangely when factors weren't used. Changes this strategy to an sapply grep, which is working much quicker. Also had errors every time daisy was used. coerced matrix back into factored data.frame for these functions.
oneillkza commented 8 years ago

And finished!