Closed mkuhn closed 5 years ago
With:
sample
├── sample.fq.bz2
├── sample.pair.1.fq.bz2
├── sample.pair.2.fq.bz2
└── sample.singles.fq.bz2
load_mocat_sample
gives:
load_mocat_sample found paired-end sample 'sample/sample.pair.1.fq.bz2' - 'sample/sample.pair.2.fq.bz2'
load_mocat_sample found single-end sample 'sample/sample.fq.bz2'
load_mocat_sample found single-end sample 'sample/sample.singles.fq.bz2'
whereas with:
sample
├── sample.fq.bz2
├── sample.pair.1.fq.bz2
├── sample.pair.2.fq.bz2
└── sample.single.fq.bz2 (notice single(s) here)
gives:
load_mocat_sample found paired-end sample 'sample/sample.pair.1.fq.bz2' - 'sample/sample.pair.2.fq.bz2' with singles file 'sample/sample.single.fq.bz2'
load_mocat_sample found single-end sample 'sample/sample.fq.bz2'
so .single.fq.gz
would be the correct usage.
Yet, (and correct me if I'm wrong @luispedro), this shouldn't make much of a difference in practice.
The only case where this may make a difference is if using load_mocat_sample
and, directly after, using map
. Here in the first case the mapper (bwa
/minimap2
) would be called 3 times, and in the second case only 2.
If calling preprocess()
after load_mocat_sample
, all pairs and singles should be merged into three files, and if using ngless 0.11.0
or above the number of mapper calls would actually be reduced to 1
thanks to https://github.com/ngless-toolkit/ngless/commit/412531775d15a05e70bc7ffc29f53f3419484af9.
The only case where this may make a difference is if using load_mocat_sample and, directly after, using map. Here in the first case the mapper (bwa/minimap2) would be called 3 times, and in the second case only 2.
The mapper is now only called once in all cases as NGLess takes care of streaming the reads uncompressed and in interleaved format.
The documentation for load_mocat_sample in stdlib.md doesn't explain how single files should be named. Reading the source code I think it has to be named
... but I'm not sure if this is set in stone. (This might also explain the reason behind #120.)