Closed unode closed 5 years ago
Yeah, these "vaguely defined" file formats are a PITA.
In NGLess, I always want to err on the side of strictness of output at the cost of computational efficiency (better wait a few more minutes than waste a week debugging a weird file format error), though, so IMHO format variant 1
is better.
Just to clarify, 1.
actually means:
@A.1
@A.2
@B.1
@B.2
...
and not
@A.1
@B.1
...
@A.2
@B.2
I have seen the last variant (concatenated) but it's a PITA to work with if you actually want to extract information from pairs.
bwa
now supports this as an input format, so if we'd use it internally when calling it, it could save having to do two calls to it (which can be have IO costs as it implies that the databases are loaded twice).
Not yet closing as I think that to fully reap the benefits would mean to use in bwa
calling and in external module calling as well.
The format is not formally described but is used in the wild. On a quick search there was no mention on how 'singles' are handled. Possibilities include:
.1
followed by.2
and add singles at the end of the fileThe second variant is more versatile (e.g. for
filter()
) as it doesn't require a second file to hold reads as they are being processed.