Closed JakubBartoszewicz closed 5 years ago
Hello,
I've also just noticed such genome sequence concatenation behavior of deepsimulator. If deepsimulator still keeps tracking the chromosome/scaffold/contig boundaries internally, it is fine. But if not, this will lead to chimeric reads (as Jakub suggested), which will be a critical problem for many use cases. So it will be great if the developers can look into this.
Best, Jia-Xing
Hi, thank you all very much for raising the point. We will add a patch to this specific situation as soon as possible. We will keep you updated.
Hi, I have added a file 'separate_contig.py' into the util folder. It can separate the fasta file into different files with one contig in each fasta file, if the user wishes so. Then the users can do simulation for each contig.
python ./utils/separate_contigs.py -i input.fasta -p output_folder
This should solve the feasibility problem. We will incorporate that into the pipeline and add the parameter for such option in near future.
Thanks for the quick response! Yes, it will be of great help if the future version of deepsimulator can take care of multi-fasta genome internally and automatically.
Best, Jia-Xing
It's done.
You can try:
git pull
git checkout discontinuous-fasta
As for a test:
./deep_simulator.sh -i example/multi.fasta
Hi,
thanks for developing this! I have a question: in the paper you write that it is possible to simulate reads from contigs. However, it seems that if one uses a multi-FASTA file as input, all the contigs get (sequentially) joined together when the reference is loaded. This can produce chimeric reads spanning across contigs. Is there an intended way of running DeepSimulator for multiple contigs while avoiding the chimeric reads (other than performing simulation for every contig separately)?
Best, Jakub