Shortcut computation of RepeatMasker and TRF for other genomes/species

Regarding "AmpliCoNE usage with other reference genomes / species". Under step 1, it indicates that output from RepeatMasker and Tandem Repeat Finder are needed.

I'm using a new assembly, and those files don't exist yet. Computing them on the entire assembly appears to be computationally quite expensive. On my 6G diploid assembly I estimate it would take 2 CPU months to run TRF, and 10 CPU days to run RepeatMasker.

However, looking at the code (bin/parse_Ychr_RepeatMasker.py and bin/parse_Ychr_TRF.py) it is clear that only elements on chromosome Y are used. And since repeat annotations are independent of other chromosomes, I should be able to get by with running TRF and RepeatMasker only on chromosome Y.

I don't think the same is true of the mappability track, since the mappability of a position on chromosome Y depends upon the entire assembly. But it looks like I can create the whole genome mappability track in only a few CPU days.

makovalab-psu / AmpliCoNE-tool

Shortcut computation of RepeatMasker and TRF for other genomes/species #2