jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

sga preprocess to also output orphaned paired reads #25

Closed nathanhaigh closed 11 years ago

nathanhaigh commented 12 years ago

I wanted sga preprocess to also output orphaned paired reads to a separate file since these can also be used downstream as single-ends and may constitute a reasonable amount of coverage, especially if strict filtering criteria are used.

I don't know C++ (I was just hacking your code) so have a partial implementation for this. I've added a new option --pe-orphans which accepts a file as its value. Here is it's current behaviour:

If --pe-orphans is specified with --pe-mode=0 and error is thrown (untested).

If --pe-mode=1 or --pe-mode=2 and --pe-orphans is not specified, orphans are sent to STDOUT. This may not be the best behaviour as it isn't backward compatible since STDOUT will have mixed interleaved pairs and orphans if --out is not specified. However my complete lack of c++ knowledge prevented me from coding something with this behaviour:

If --pe-mode=1 or --pe-mode=2 and --pe-orphans is not specified, orphans are discarded. The same as the current behaviour.

If --pe-mode=1 or --pe-mode=2 and --pe-orphans is specified, orphans are sent to the file irrespective of whether or not --out was specified.

jts commented 11 years ago

HI Nathan,

I implemented this in ae5be3957. I believe it implements the behaviour you are looking for. The orphans file must be different than the --output file. I don't want them to be output to the same file, as some downstream tools assume that the pairs are interleaved. Let me know if there are any problems.

jared