zstephens / neat-genreads

NEAT read simulation tools
Other
92 stars 27 forks source link

Question about coverage #67

Closed kkamii closed 4 years ago

kkamii commented 4 years ago

hello!

I want to make WES data having the average coverage per exon that I want.

the problem is, I don't know how to adjust 'average coverage per exon' not 'total average coverage' for example... exon 1 coverage --> 200X exon 2 coverage --> 150X exon 3 coverage --> 400X ...

plz help me

zstephens commented 4 years ago

Greetings!

I was originally intending to add a feature like this, such that if you provided a bedfile with some coordinates that you could specify the relative coverage you wanted in each bin. E.g.:

chr1 1000 2000 COV=200 chr1 3000 4000 COV=150

etc..

I ended up not pursuing this feature as it was difficult to integrate with the existing coverage calculations. So if you're interested in generating data with varying coverage, I'd recommend splitting it up into multiple simulations. So you have a different bed file for each exon, and genReads.py is called for each one, with the desired coverage specified via -c. I know that's possibly really laborious, but it's the only way I could think to do it at the moment!

-Zach

zstephens commented 4 years ago

Greetings!

I recently added an input option which partially accomplishes this feature: --force-coverage. This option causes NEAT to bypass GC-bias and to instead sample exactly (coverage)/(readlen) reads for each window. Getting individual exons would still require multiple invocations of the simulator, with different target bed files (-t), though.