tseemann / nullarbor

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
GNU General Public License v2.0
134 stars 37 forks source link

Support for multiple lanes PE #245

Closed cmkobel closed 4 years ago

cmkobel commented 4 years ago

Hello.

We often illumina-sequence in multiple lanes. This means that I have up to 8 paired end pairs which are independent but very relevant to combine, since a higher coverage is obtained through combination.

The format of the samples.tab-file seems to only support a single pair.

Is there a straightforward way to supply multiple lanes of PE reads to nullarbor, or should I manually concatenate the lanes?

tseemann commented 4 years ago

It sounds like you are using a NextSeq 500 or similar which spreads all libraries over 4 lanes? We also use that. It's very annoying, so we changed the bcl2fastq command to automatically create a single R1 file instead of 4 files L001 to L004.

Concatenating, which is as simole as cat blah*.fq.gz > blah.fq.gz (no need to uncompress) is the easy solution but wastes disk space, at least temporarily.

Unfortunately, all the tools the nullarbor use will not support multiple lanes, they need a single R1 and R2. So i could add support to Nullarbor, but i would just be forced to make cat copies. I think that is better done by you, s you may want to replace your 8 files with 1 permanently.

cmkobel commented 4 years ago

OK. I think I will write a wrapper that converts a custom-samples.tab with multiple lanes to a nullarbor-samples.tab, automatically merging the lanes and putting them in an adjacent folder.