liyu95 / DeepSimulator

The first deep learning based Nanopore simulator which can simulate the process of Nanopore sequencing.
117 stars 40 forks source link

Simulation from multiple contigs in Multi-FASTA #30

Closed JakubBartoszewicz closed 5 years ago

JakubBartoszewicz commented 5 years ago

Hi,

thanks for developing this! I have a question: in the paper you write that it is possible to simulate reads from contigs. However, it seems that if one uses a multi-FASTA file as input, all the contigs get (sequentially) joined together when the reference is loaded. This can produce chimeric reads spanning across contigs. Is there an intended way of running DeepSimulator for multiple contigs while avoiding the chimeric reads (other than performing simulation for every contig separately)?

Best, Jakub

yjx1217 commented 5 years ago

Hello,

I've also just noticed such genome sequence concatenation behavior of deepsimulator. If deepsimulator still keeps tracking the chromosome/scaffold/contig boundaries internally, it is fine. But if not, this will lead to chimeric reads (as Jakub suggested), which will be a critical problem for many use cases. So it will be great if the developers can look into this.

Best, Jia-Xing

liyu95 commented 5 years ago

Hi, thank you all very much for raising the point. We will add a patch to this specific situation as soon as possible. We will keep you updated.

liyu95 commented 5 years ago

Hi, I have added a file 'separate_contig.py' into the util folder. It can separate the fasta file into different files with one contig in each fasta file, if the user wishes so. Then the users can do simulation for each contig.

python ./utils/separate_contigs.py -i input.fasta -p output_folder

This should solve the feasibility problem. We will incorporate that into the pipeline and add the parameter for such option in near future.

yjx1217 commented 5 years ago

Thanks for the quick response! Yes, it will be of great help if the future version of deepsimulator can take care of multi-fasta genome internally and automatically.

Best, Jia-Xing

liyu95 commented 5 years ago

It's done.

You can try:

git pull
git checkout discontinuous-fasta

As for a test:

./deep_simulator.sh -i example/multi.fasta