It seems like neat processes regions in the reference that I did not explicitly include in the target bed file. I want to simulate reads for some cancer related genes using the full human reference genome and a target bed file, but before using the complete bed file I wanted to test it with just 2 regions of chromosome 2. However, it seems like neat generates random mutations and samples reads for other chromosomes too. I don’t know if they will be written to the fastq file in the end, but these unnecessary steps makes the process much slower. Additionally, neat also generated mutations for the whole chromosome 2 and then filtered out the ones that landed in regions that were not included in the bed file.
Describe the bug
It seems like neat processes regions in the reference that I did not explicitly include in the target bed file. I want to simulate reads for some cancer related genes using the full human reference genome and a target bed file, but before using the complete bed file I wanted to test it with just 2 regions of chromosome 2. However, it seems like neat generates random mutations and samples reads for other chromosomes too. I don’t know if they will be written to the fastq file in the end, but these unnecessary steps makes the process much slower. Additionally, neat also generated mutations for the whole chromosome 2 and then filtered out the ones that landed in regions that were not included in the bed file.
To Reproduce
Steps to reproduce the behavior:
Download the latest human reference genome: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/
Make a copy of the provided template config file (I called it test_config.yml) and set the parameters:
‘’’reference:
target_bed:
rng_seed: 6386514007882411’’’
The rest are left with the “.” as default.
Run neat on the command line:
neat --log-name test --log-detail HIGH --log-level DEBUG read-simulator -c test_config.yml -o test
Expected behavior
Maybe process just targeted regions to accelerate operations.
Additional comments
Desktop: