yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
55 stars 5 forks source link

Simulating hifi read for variant calling benchmarking #31

Open lok27395 opened 5 days ago

lok27395 commented 5 days ago

Hi all, I am working on benchmarking variant calling tools and I am now simulating low and high coverage hifi reads for this purpose. But I am not sure whether my workflow is correct as I received message about my input was in .ccs instead of hifi (sometimes...)

Current workflow

Pbsim3 - simulate WGS raw data from fasta pbsim --strategy wgs --method errhmm --errhmm ~/miniconda3/pkgs/pbsim3-3.0.4-h4ac6f70_0/data/ERRHMM-SEQUEL.model --depth 20 --genome ~/HG002_BCM/hg002v1.1.fasta --pass-num 10 samtools - covert raw .sam to raw .bam pbccs - covert raw .bam into .fastq

New workflow (because I just found that pbtk allows merging multiple .bam & pbtk allows extracting hifi from bam)

Pbsim3 - as above samtools - as above pbtk - merge multiple .bam into one pbccs - convert merged.bam into ccs.bam pbtk - extract hifi from ccs.bam (from above output)

Question

Many thanks, to correct my misunderstanding.....

yukiteruono commented 5 days ago

Thank you for using PBSIM3. In PBSIM, you can change the ID prefix by using --id-prefix. If you merge simulation data without changing the prefix, IDs will be duplicated. Is your workflow managing IDs properly?

lok27395 commented 5 days ago

Thank you for using PBSIM3. In PBSIM, you can change the ID prefix by using --id-prefix. If you merge simulation data without changing the prefix, IDs will be duplicated. Is your workflow managing IDs properly?

Hi Yuki, I used cat to merge the .fastq files previous workflow and it seemed no problems; Now I tried to use pbtk to merge multiple .bam and it seemed no problem (in terms of file size and I am currently running ccs to get .ccs.bam so I am not sure about the further analysis)

Thank you for your help, but my focus would be on generating simulated hifi reads from those tools for now

Thanks anyway!