hbc / bcbioRNASeq

R package for bcbio RNA-seq analysis.
https://bioinformatics.sph.harvard.edu/bcbioRNASeq
GNU Affero General Public License v3.0
58 stars 21 forks source link

samplename order fix #120

Closed roryk closed 5 years ago

roryk commented 5 years ago

Some folks at AZ noticed that sometimes the sampleNames can get applied to the wrong files, for example:

                                                                               AS_1
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_10/salmon/quant.sf"
                                                                                AS_10
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_11/salmon/quant.sf"
                                                                                AS_11
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_12/salmon/quant.sf"
                                                                                AS_12
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_1/salmon/quant.sf"
                                                                                 AS_2
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_2/salmon/quant.sf"
                                                                                 AS_3
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_3/salmon/quant.sf"
                                                                                 AS_4
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_4/salmon/quant.sf"
                                                                                 AS_5
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_5/salmon/quant.sf"
                                                                                 AS_6
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_6/salmon/quant.sf"
                                                                                 AS_7
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_7/salmon/quant.sf"
                                                                                 AS_8
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_8/salmon/quant.sf"
                                                                                 AS_9
"/bigseq/analysis/20190305_CVRM_TweakFn4_SOUK003803_bcbio/final/AS_9/salmon/quant.sf"

the list.files function doesn't put them in any particular order, so here we sort the sampleNames and then sort the files to put them in the right order.

roryk commented 5 years ago

This fixed it for me in a toy thing I made, but I didn't do all the tests. Opening up a p/r so I can make sure the tests still pass.

mjsteinbaugh commented 5 years ago

Thanks @roryk

mjsteinbaugh commented 5 years ago

I made a couple of tweaks to the AppVeyor config, to include testing against the previous release version.

roryk commented 5 years ago

Did you ever figure out the resolution to that GenomeInfoDbData not getting found by bioconductor issue? We're running into that here with travis. I poked around and saw you had a back and forth about it a few years back.

roryk commented 5 years ago

Thanks. How did you push to this pull request? You did magic.

mjsteinbaugh commented 5 years ago

Let me check my notes about that. Otherwise, take a look at the fork I've been testing: https://github.com/acidgenomics/bcbioRNASeq

mjsteinbaugh commented 5 years ago

I switched to using a Docker image for Travis because I was running into too many configuration and timeout issues.

mjsteinbaugh commented 5 years ago

Yeah you can see here a reworked approach that avoids the need to use list.files(). I think this is a better way to go.

roryk commented 5 years ago

Yeah you can see here a reworked approach that avoids the need to use list.files(). I think this is a better way to go.

Yup, agreed. I'll swap to that.

roryk commented 5 years ago

I'm cool for dropping R 3.4 support-- bioconda is onto 3.5 so we don't have a need to be on 3.4 anymore.

mjsteinbaugh commented 5 years ago

That makes unit testing easier -- let's pin to R 3.5 instead then.

codecov[bot] commented 5 years ago

Codecov Report

Merging #120 into master will increase coverage by 0.21%. The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
+ Coverage   98.15%   98.37%   +0.21%     
==========================================
  Files          45       45              
  Lines        2280     2273       -7     
==========================================
- Hits         2238     2236       -2     
+ Misses         42       37       -5
Impacted Files Coverage Δ
R/tximport-internal.R 96.07% <83.33%> (+8.14%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d346456...5c27e8a. Read the comment docs.

roryk commented 5 years ago

THANK YOU SO MUCH, it would have taken me forever to figure out how to get the tests working.

mjsteinbaugh commented 5 years ago

@roryk I got Docker working for the HBC R packages, so if we run into Travis build time and cache issues, we can switch over to using those images. For now let's stick with testing the code against macOS on R 3.5.

roryk commented 5 years ago

Awesome, thanks so much.

roryk commented 5 years ago

Going to cut a new release and update the bioconda recipe now.