iontorrent / TS

Torrent Suite
Other
40 stars 14 forks source link

Process results for each barcode immediately #28

Closed keithcallenberg closed 4 months ago

keithcallenberg commented 7 years ago

Move process_results() call into same for loop as variant_caller_pipeline() for non-multisample runs.

TVC 5.2 calls process_results() only after all barcodes have gone through the pipeline. This behavior differs from all previous TVC versions and means that XLS files and other report files will not be available until the very end. Clinical labs depend on being able to access resulting XLS files as the barcodes finish.

This change should be relatively benign because variant_caller_pipeline() takes typically takes 10-20 times longer to execute than process_results().

jveitchv commented 7 years ago

Hi Keith, how are you discovering the report files have been created? I am on the TVC team and I ask because if we make this change then we might also change something else that breaks your pipeline. TVC can be run standalone which might be safer than relying on internals.

keithcallenberg commented 7 years ago

Hi Jim,

  1. We have code that looks for these files specifically on the filesystem. But I understand why you raise this point, because I notice that the API no longer provides details like this until the end of the run.

  2. You say you might need to make other changes to make my change. Can you explain what the other changes might be? In general we have no problem making minor adjustments, and to be clear, that is not what we have been complaining about overall -- rather it is about things in TVC and TS that are outside our control.

  3. I also understand why you suggest TVC standalone. That is a natural segue since we are complaining about these things outside our control. But while we are actively discussing this possibility, it is a long-term solution and will require extensive validation. A major barrier is not just TVC standalone, but the upstream (hotspots, other prep work) and downstream scripts (e.g. generate_variant_tables.py). You have made TVC itself standalone, but the plugin and scripts around TVC -- do you officially or unofficially support running them standalone? When I look at the python imports they look pretty easy to run on their own as long as we have things like numpy, but maybe you're aware of any caveats to running these outside of TS?

Keith

jveitchv commented 7 years ago

On 1 & 2: I was asking about your pipeline because I didn't know what it does, so there may be other dependencies I am unaware of that we might break in the future.

On 3: TVC standalone is supported. We also officially support tvcutils (used for the hotspot handling, this is what we broke in 5.2, because we didn't test BAMs with variants that hit duplicate hotspots). But it's a very good question about the other components that generate the tables, those we don't officially support. We do support the vcf output, especially the OID tables and HS tags. Happy to talk further.