makovalab-psu / AmpliCoNE-tool

AmpliCoNE: Ampliconic Copy Number Estimator
1 stars 1 forks source link

Shortcut computation of RepeatMasker and TRF for other genomes/species #2

Open rsharris opened 2 years ago

rsharris commented 2 years ago

Regarding "AmpliCoNE usage with other reference genomes / species". Under step 1, it indicates that output from RepeatMasker and Tandem Repeat Finder are needed.

I'm using a new assembly, and those files don't exist yet. Computing them on the entire assembly appears to be computationally quite expensive. On my 6G diploid assembly I estimate it would take 2 CPU months to run TRF, and 10 CPU days to run RepeatMasker.

However, looking at the code (bin/parse_Ychr_RepeatMasker.py and bin/parse_Ychr_TRF.py) it is clear that only elements on chromosome Y are used. And since repeat annotations are independent of other chromosomes, I should be able to get by with running TRF and RepeatMasker only on chromosome Y.

I don't think the same is true of the mappability track, since the mappability of a position on chromosome Y depends upon the entire assembly. But it looks like I can create the whole genome mappability track in only a few CPU days.

rahulsimham commented 2 years ago

You are correct. You can run RepeatMasker and Tandem Repeat Finder on chrY specifically and for mappability track you have to run it on whole genome.