ncsa / NEAT

Other
37 stars 12 forks source link

Processing of off-target reference regions using target bed file #114

Closed dani-ture closed 4 hours ago

dani-ture commented 2 weeks ago

Describe the bug

It seems like neat processes regions in the reference that I did not explicitly include in the target bed file. I want to simulate reads for some cancer related genes using the full human reference genome and a target bed file, but before using the complete bed file I wanted to test it with just 2 regions of chromosome 2. However, it seems like neat generates random mutations and samples reads for other chromosomes too. I don’t know if they will be written to the fastq file in the end, but these unnecessary steps makes the process much slower. Additionally, neat also generated mutations for the whole chromosome 2 and then filtered out the ones that landed in regions that were not included in the bed file.

To Reproduce

Steps to reproduce the behavior:

  1. Download the latest human reference genome: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/

  2. Make a copy of the provided template config file (I called it test_config.yml) and set the parameters:

    ‘’’reference:

    target_bed:

    rng_seed: 6386514007882411’’’

    The rest are left with the “.” as default.

  3. Run neat on the command line:neat --log-name test --log-detail HIGH --log-level DEBUG read-simulator -c test_config.yml -o test

Expected behavior

Maybe process just targeted regions to accelerate operations.

Additional comments

20240614_error

Desktop:

joshfactorial commented 2 weeks ago

Thanks for submitting this bug, I will take a look hopefully this weekend.

joshfactorial commented 4 hours ago

I think this ticket is fixed now. Please reopen if you have further issues.