hammerlab / biokepi

Bioinformatics Ketrew Pipelines
Apache License 2.0
27 stars 4 forks source link

OptiType does not use razerS3 to filter to HLA reads prior to running #423

Closed tavinathanson closed 7 years ago

tavinathanson commented 7 years ago

Per the README, and sounds like @rleonid has always done this:

Optional step zero: you might want to filter your sequencing data for HLA reads. Should you have to re-run OptiType multiple times on the same sample (different settings, etc.) it could save you time if instead of giving OptiType the full, multi-gigabyte sequencing data multiple times, you would rather give it the relevant reads only, on the order of megabytes.

Razers3 is called internally in https://github.com/FRED-2/OptiType/blob/master/OptiTypePipeline.py but that appears to not be doing the same thing (note -m 99999, for example).

tavinathanson commented 7 years ago

Re whether this could impact the results, from @rleonid:

Results: (AFAIK) theoretically, no. But practically, I wouldn’t be surprised if numerically there are some issues.

rleonid commented 7 years ago

I'd like to recluse myself from being the judge/authority on constraint solvers. I've implemented toy simplex solvers but I don't know what kind of heuristics modern solvers might use.

tavinathanson commented 7 years ago

Possible, but seems unlikely to me, that this could address the memory issues in #419.

tavinathanson commented 7 years ago

@ihodes I'll try the pre-filtering manually on 30GB RAM and see what happens.

tavinathanson commented 7 years ago

Discussed offline: pre-filtering manually with 30GB of RAM also ran out of memory.

ihodes commented 7 years ago

Closing for https://github.com/hammerlab/biokepi/issues/426