Time series PTR estimation

brendanwee commented 2 years ago

I have a dataset with two timepoints. I am thinking that it is probably best to run the estimation on the full dataset (t0+t1) to give CoPTR the best chance at predicting PTRs (reordering bins). Are there any other assumptions in estimate that would suggest I separate the two timepoints during this step?

brendanwee commented 2 years ago

Also, do you have any insight on the amount of memory the estimation step would take with 2000+ coverage maps?

tyjo commented 2 years ago

I would recommend running on all time points together. More samples should allow CoPTR-Ref to better estimate the position of the replication origin, and CoPTR-Contig to better reorder bins along the genome.

Also, do you have any insight on the amount of memory the estimation step would take with 2000+ coverage maps?

For the data in the paper (~1300 samples), I was able to run CoPTR on a 16GB machine. I tried to keep memory requirements low. In the estimate step, CoPTR groups coverage maps by species, writes them to disk, then loads them one at a time to estimate PTRs across samples. This way it doesn't need to store all coverage maps at once.

brendanwee commented 2 years ago

Awesome! Thank you Tyler!

tyjo / coptr

Time series PTR estimation #12