Restarting a failed run

zstephens / telogator2

A method for measuring allele-specific TL and characterizing telomere variant repeat (TVR) sequences from long reads.

MIT License

11 stars 1 forks source link

Restarting a failed run #1

Closed tbenavi1 closed 3 months ago

tbenavi1 commented 3 months ago

Hello, thanks for the great tool. It would be great if telogator2 could restart a run that had previously failed, by looking at what files already exist and not repeating those steps. For example, telogator2 failed at the step where it needed to use minimap2, because I had provided an incorrect filepath that couldn't be accessed. It would be great if I could rerun telogator2 with the same command (except fixing the minimap2 path) and it could start from where it left out without repeating the previously competed steps. Thanks again for your consideration.

zstephens commented 3 months ago

I haven't documented all the optional input parameters (though I really should), but there are some input flags for doing exactly this! Specifically:

--debug-npy saves all the intermediary data generated in the clustering steps in the temp/ directory, and will reuse files if present. (the intermediary files are only generated if this flag is provided, so if you restart a run that hadn't used this option then it will still be like starting from scratch). There are a few things that still get reprocessed each time (like the initial read filtering), but it should dramatically reduce computation time.

--debug-noplot is the same, but for all the images generated in the temp/ directory.

--debug-realign won't realign the subtels if a BAM for them already exists.

It probably also wouldn't hurt to test the minimap2 executable at the very beginning..

tbenavi1 commented 3 months ago

Thank you so much! Another question I have is whether telogator2 could be run on both hifi and ONT data at the same time. I assume for most steps (except of course alignment), HiFi data and ONT data are treated the same way. For now, I was thinking of separately running telogator2 on the HiFi and ONT data, and then figuring out a way to combine the results to get an overall estimate. Thanks for any insights.

zstephens commented 3 months ago

There are a few other under-the-hood differences when running Telogator2 with -r hifi vs. -r ont, most notably that with ONT data I add a few extra kmers to search for which correspond to some nanopore-specific sequencing artifacts.

In the past I've processed ONT data as if it were PacBio and it was pretty much fine though. So you could try providing both (e.g. specifying multiple inputs via -i pacbio.fa nanopore.fa) and seeing what happens. There are some limitations for Nanopore data basecalled prior to Dorado, but if the data is recently generated it should be fine.