stat-lab / EvalSVcallers

Evaluate the performances (precision and recall) of structural variation (SV) callers
32 stars 13 forks source link

Dissimilar data points in re-analysis #13

Closed Mkddb closed 3 years ago

Mkddb commented 3 years ago

Dear team,

I am checking a few tools with the raw data and scripts as given/mentioned in your analysis.

There are minor fractional diffrences with the Simulated data, but i am getting significant differences from Real datasets.

I am using Real dataset 4 (NA12878) downloaded from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495 Files :+1: SRR1910373_1.fastq.bz2 and SRR1910373_1.fastq.bz2 and unzipped them.

Followed these commands to generate bam and index file..

bwa mem hs37d5.fa SRR1910373_1.fastq SRR1910373_2.fastq > SRR1910373.sam samtools view -S -b SRR1910373.sam > SRR1910373.bam samtools sort SRR1910373.bam -o SRR1910373_sorted.bam samtools index SRR1910373_sorted.bam.bai

bwa version - Version: 0.7.17-r1188 samtools version - Version: 1.7 (using htslib 1.7-2)

Reference data used : hs37d5.fa Link : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

Then, installed Manta 0.29.5 using the default commands in installation.md

Executed it to call SV on Real data 4 :

configManta.py --bam SRR1910373_sorted.bam --referenceFasta hs37d5.fa --runDir /media/divakar/Thirddisk/RealData_Kosugianalysis/new1/

./runWorkflow.py -m local -j 6 -g 100

Conversion and evaluation of *VCFs called

perl convert_Manta_vcf.pl diploidSV.vcf > diploidSVconverted.vcf

perl evaluate_SV_callers.pl -r N diploidSVconverted.vcf

Obtained this result Ref-DEL: 9176 tinny (<= 100 bp): 1192 short (<=1.0 kp): 5558 middle (<= 100 kb): 2330 large (> 100 kb): 96 Ref-INS: 13669 short (<=1.0 kp): 12660 middle (<= 100 kb): 1008 large (> 100 kb): 1 Ref-DUP: 2604 short (<=1.0 kp): 873 middle (<= 100 kb): 1585 large (> 100 kb): 146 Ref-INV: 274 short (<=1.0 kp): 26 middle (<= 100 kb): 200 large (> 100 kb): 48

<< NA12878_DGV-2016_LR-assembly >>

DEL

        <Number of supporting reads>
        2   3   4   5   6   7   8   9   10  12

Call (A) 5991 5945 5890 5823 5732 5628 5498 5368 5216 4902
Recall (A) 41.1 41 41 40.9 40.6 40.3 39.9 39.5 38.8 37.3
Precis (A) 63 63.4 63.8 64.4 65.1 65.8 66.7 67.6 68.3 69.9
Call (SS) 2291 2268 2241 2210 2171 2118 2054 1983 1912 1765
Recall (SS) 60.4 60.4 60.4 60.2 59.9 59.4 59.1 58.1 57.1 55.1
Precis (SS) 40.3 40.7 41.2 41.6 42.1 42.8 43.7 44.5 45 46.6
Call (S) 2887 2866 2842 2806 2756 2707 2648 2598 2529 2401
Recall (S) 42.1 42 41.9 41.8 41.5 41.1 40.6 40.3 39.5 38
Precis (S) 74.1 74.5 75 75.8 76.7 77.4 78.3 79.2 79.9 81.3
Call (M) 771 769 766 766 765 763 756 748 736 697 Recall (M) 30.1 30.1 30 30 30 29.9 29.7 29.5 29.1 27.7
Precis (M) 90.6 90.8 90.8 90.8 90.8 90.8 91 91.1 91.5 91.9
Call (L) 42 42 41 41 40 40 40 39 39 39
Recall (L) 12.5 12.5 11.4 11.4 10.4 10.4 10.4 10.4 10.4 10.4
Precis (L) 26.1 26.1 24.3 24.3 22.5 22.5 22.5 23 23 23

DUP

        <Number of supporting reads>
        2   3   4   5   6   7   8   9   10  12

Call (A) 466 466 463 456 444 431 408 389 369 339 Recall (A) 8 8 7.9 7.7 7.6 7.4 6.9 6.4 6 5.3 Precis (A) 47.2 47.2 47.3 46.7 47.2 46.8 46 44.9 44.4 42.1
Call (S) 312 312 309 302 292 279 260 241 223 195 Recall (S) 19.6 19.6 19.5 19 18.7 18 16.8 15.7 14.7 12.9
Precis (S) 63.4 63.4 63.7 63.2 64.3 64.5 64.2 63.9 64.1 63.5
Call (M) 126 126 126 126 125 125 123 123 122 120 Recall (M) 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1
Precis (M) 16.6 16.6 16.6 16.6 16.8 16.8 16.2 16.2 16.3 15
Call (L) 28 28 28 28 27 27 25 25 24 24
Recall (L) 2 2 2 2 2 2 1.3 1.3 1.3 1.3 Precis (L) 3.5 3.5 3.5 3.5 3.7 3.7 4 4 4.1 4.1

INS

        <Number of supporting reads>
        2   3   4   5   6   7   8   9   10  12

Call (A) 2546 2544 2544 2542 2529 2517 2503 2489 2465 2416
Recall (A) 14.3 14.3 14.3 14.3 14.3 14.2 14.2 14.1 14 13.8
Precis (A) 77.2 77.2 77.2 77.3 77.4 77.5 77.7 77.9 78.1 78.4

INV

        <Number of supporting reads>
        2   3   4   5   6   7   8   9   10  12

Call (A) 259 259 258 257 255 254 253 251 247 238 Recall (A) 24.4 24.4 24.4 24 24 24 24 24 24 22.9
Precis (A) 25.8 25.8 25.9 25.6 25.8 25.9 26 26.2 26.7 26.4

While, the data in Table S9 of additional file 3 was as following :

Manta DEL All Call (A) 5012 4966 4888 4788 4604 4405 4177 3908 3610 2953 Recall (A) 37.2 37.2 37 36.7 35.9 34.9 33.5 31.8 29.8 25.1 Precis (A) 68.2 68.7 69.6 70.3 71.6 72.7 73.7 74.6 75.9 78 MIER (A) 30.9 30.4 29.6 28.8 27.8 27.1 26 25.1 24.3 21.8 SS Call (SS) 1870 1849 1810 1753 1658 1579 1469 1363 1235 978 Recall (SS) 55.9 55.8 55.7 55.1 54.1 53.3 51 48.4 44.7 37 Precis (SS) 44.3 44.7 45.5 46.4 48 49.3 50.8 51.4 52.9 54.8 MIER (SS) 45.8 45.4 44.8 43.9 43.1 42.6 41.9 41.1 40.7 38.8 S Call (S) 2439 2417 2380 2342 2267 2172 2080 1954 1826 1520 Recall (S) 37.7 37.6 37.5 37.1 36.2 35.1 33.9 32 30.4 25.7 Precis (S) 79.6 80.2 81.2 81.6 82.4 83.6 84.2 85.1 86.2 88.3 MIER (S) 25.3 24.8 23.9 23.2 22 21 20 19 18.1 15.6 M Call (M) 676 674 672 667 653 630 604 570 528 436 Recall (M) 27.6 27.5 27.5 27.3 26.8 25.9 24.8 23.6 21.9 18.1 Precis (M) 94.6 94.8 94.7 94.9 95.2 95.2 95.1 95.7 95.8 96.3 MIER (M) 9.4 9.3 9.2 8.9 8.5 8.7 8.4 7.8 7.7 5.7 L Call (L) 27 26 26 26 26 24 24 21 21 19 Recall (L) 11.4 10.4 10.4 10.4 10.4 9.3 8.3 7.2 7.2 7.2 Precis (L) 37 34.6 34.6 34.6 34.6 33.3 33.3 33.3 33.3 36.8 MIER (L) 33.3 34.6 34.6 34.6 34.6 29.1 29.1 19 19 15.7 DUP All Call (A) 324 322 313 301 284 260 237 216 187 138 Recall (A) 5.9 5.9 5.8 5.5 5.3 4.7 4.3 3.9 3.4 2.4 Precis (A) 50 50.3 50.4 50.1 50.3 48.4 48.5 48.1 48.6 47.1 MIER (A) 45.6 45.3 45 44.1 42.6 41.5 40.5 41.6 42.2 43.4 S Call (S) 224 222 213 201 185 164 144 126 114 83 Recall (S) 14.6 14.6 14.3 13.6 12.9 11.4 10.3 9.2 8.5 6.1 Precis (S) 63.8 64.4 65.2 65.6 67 65.8 67.3 68.2 69.2 67.4 MIER (S) 56.2 55.8 55.8 55.2 53.5 52.4 51.3 53.9 51.7 50.6 M Call (M) 85 85 85 85 84 81 79 77 63 46 Recall (M) 1.1 1.1 1.1 1.1 1.1 1 1 1 0.7 0.5 Precis (M) 21.1 21.1 21.1 21.1 21.4 20.9 21.5 22 19 19.5 MIER (M) 23.5 23.5 23.5 23.5 23.8 24.6 25.3 25.9 28.5 34.7 L Call (L) 15 15 15 15 15 15 14 13 10 9 Recall (L) 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0 0 Precis (L) 6.6 6.6 6.6 6.6 6.6 6.6 7.1 7.6 0 0 MIER (L) 13.3 13.3 13.3 13.3 13.3 13.3 14.2 15.3 20 22.2 INS All Call (A) 1819 1816 1809 1803 1782 1754 1727 1680 1641 1538 Recall (A) 11 11 11 10.9 10.8 10.5 10.4 10.1 9.9 9.3 Precis (A) 77.7 77.7 77.8 77.9 78.1 78.5 78.8 79.4 79.7 80.8 MIER (A) 55.5 55.5 55.5 55.4 55.2 54.7 54.7 54.5 54.2 53.5 INV All Call (A) 180 180 178 175 173 166 158 148 137 120 Recall (A) 22.2 22.2 21.8 21.5 21.1 19.7 19.7 18.6 17.1 15.3 Precis (A) 33.8 33.8 33.7 33.7 33.5 32.5 34.1 34.4 34.3 35 MIER (A) 16.6 16.6 16.2 16 15.6 15 15.1 14.8 14.5 13.3

There are significant differences in Recall and Precision values. do you see something erroneous for this difference ?

Tried on few other tools and obtained similar diffrences for Real dataset4 but not for SIM-A.

can you help me to understand and fix this ?

stat-lab commented 3 years ago

We have not used all the SRR1910373 reads but used only 30x coverage reads from SRR1910373. Check Table S3 and carefully read our manuscript.

2020/10/06 14:51、Mkddb notifications@github.comのメール:

Dear team,

I am checking a few tools with the raw data and scripts as given/mentioned in your analysis.

There are minor fractional diffrences with the Simulated data, but i am getting significant differences from Real datasets.

I am using Real dataset 4 (NA12878) downloaded from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495 Files 👍 SRR1910373_1.fastq.bz2 and SRR1910373_1.fastq.bz2 and unzipped them.

Followed these commands to generate bam and index file..

bwa mem hs37d5.fa SRR1910373_1.fastq SRR1910373_2.fastq > SRR1910373.sam samtools view -S -b SRR1910373.sam > SRR1910373.bam samtools sort SRR1910373.bam -o SRR1910373_sorted.bam samtools index SRR1910373_sorted.bam.bai

bwa version - Version: 0.7.17-r1188 samtools version - Version: 1.7 (using htslib 1.7-2)

Reference data used : hs37d5.fa Link : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

Then, installed Manta 0.29.5 using the default commands in installation.md

Executed it to call SV on Real data 4 :

configManta.py --bam SRR1910373_sorted.bam --referenceFasta hs37d5.fa --runDir /media/divakar/Thirddisk/RealData_Kosugianalysis/new1/

./runWorkflow.py -m local -j 6 -g 100

Conversion and evaluation of *VCFs called

perl convert_Manta_vcf.pl diploidSV.vcf > diploidSVconverted.vcf

perl evaluate_SV_callers.pl -r N diploidSVconverted.vcf

Obtained this result Ref-DEL: 9176 tinny (<= 100 bp): 1192 short (<=1.0 kp): 5558 middle (<= 100 kb): 2330 large (> 100 kb): 96 Ref-INS: 13669 short (<=1.0 kp): 12660 middle (<= 100 kb): 1008 large (> 100 kb): 1 Ref-DUP: 2604 short (<=1.0 kp): 873 middle (<= 100 kb): 1585 large (> 100 kb): 146 Ref-INV: 274 short (<=1.0 kp): 26 middle (<= 100 kb): 200 large (> 100 kb): 48

<< NA12878_DGV-2016_LR-assembly >>

DEL

  <Number of supporting reads>
  2   3   4   5   6   7   8   9   10  12

Call (A) 5991 5945 5890 5823 5732 5628 5498 5368 5216 4902 Recall (A) 41.1 41 41 40.9 40.6 40.3 39.9 39.5 38.8 37.3 Precis (A) 63 63.4 63.8 64.4 65.1 65.8 66.7 67.6 68.3 69.9 Call (SS) 2291 2268 2241 2210 2171 2118 2054 1983 1912 1765 Recall (SS) 60.4 60.4 60.4 60.2 59.9 59.4 59.1 58.1 57.1 55.1 Precis (SS) 40.3 40.7 41.2 41.6 42.1 42.8 43.7 44.5 45 46.6 Call (S) 2887 2866 2842 2806 2756 2707 2648 2598 2529 2401 Recall (S) 42.1 42 41.9 41.8 41.5 41.1 40.6 40.3 39.5 38 Precis (S) 74.1 74.5 75 75.8 76.7 77.4 78.3 79.2 79.9 81.3 Call (M) 771 769 766 766 765 763 756 748 736 697 Recall (M) 30.1 30.1 30 30 30 29.9 29.7 29.5 29.1 27.7 Precis (M) 90.6 90.8 90.8 90.8 90.8 90.8 91 91.1 91.5 91.9 Call (L) 42 42 41 41 40 40 40 39 39 39 Recall (L) 12.5 12.5 11.4 11.4 10.4 10.4 10.4 10.4 10.4 10.4 Precis (L) 26.1 26.1 24.3 24.3 22.5 22.5 22.5 23 23 23

DUP

  <Number of supporting reads>
  2   3   4   5   6   7   8   9   10  12

Call (A) 466 466 463 456 444 431 408 389 369 339 Recall (A) 8 8 7.9 7.7 7.6 7.4 6.9 6.4 6 5.3 Precis (A) 47.2 47.2 47.3 46.7 47.2 46.8 46 44.9 44.4 42.1 Call (S) 312 312 309 302 292 279 260 241 223 195 Recall (S) 19.6 19.6 19.5 19 18.7 18 16.8 15.7 14.7 12.9 Precis (S) 63.4 63.4 63.7 63.2 64.3 64.5 64.2 63.9 64.1 63.5 Call (M) 126 126 126 126 125 125 123 123 122 120 Recall (M) 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1 Precis (M) 16.6 16.6 16.6 16.6 16.8 16.8 16.2 16.2 16.3 15 Call (L) 28 28 28 28 27 27 25 25 24 24 Recall (L) 2 2 2 2 2 2 1.3 1.3 1.3 1.3 Precis (L) 3.5 3.5 3.5 3.5 3.7 3.7 4 4 4.1 4.1

INS

  <Number of supporting reads>
  2   3   4   5   6   7   8   9   10  12

Call (A) 2546 2544 2544 2542 2529 2517 2503 2489 2465 2416 Recall (A) 14.3 14.3 14.3 14.3 14.3 14.2 14.2 14.1 14 13.8 Precis (A) 77.2 77.2 77.2 77.3 77.4 77.5 77.7 77.9 78.1 78.4

INV

  <Number of supporting reads>
  2   3   4   5   6   7   8   9   10  12

Call (A) 259 259 258 257 255 254 253 251 247 238 Recall (A) 24.4 24.4 24.4 24 24 24 24 24 24 22.9 Precis (A) 25.8 25.8 25.9 25.6 25.8 25.9 26 26.2 26.7 26.4

While, the data in Table S9 of additional file 3 was as following :

Manta DEL All Call (A) 5012 4966 4888 4788 4604 4405 4177 3908 3610 2953 Recall (A) 37.2 37.2 37 36.7 35.9 34.9 33.5 31.8 29.8 25.1 Precis (A) 68.2 68.7 69.6 70.3 71.6 72.7 73.7 74.6 75.9 78 MIER (A) 30.9 30.4 29.6 28.8 27.8 27.1 26 25.1 24.3 21.8 SS Call (SS) 1870 1849 1810 1753 1658 1579 1469 1363 1235 978 Recall (SS) 55.9 55.8 55.7 55.1 54.1 53.3 51 48.4 44.7 37 Precis (SS) 44.3 44.7 45.5 46.4 48 49.3 50.8 51.4 52.9 54.8 MIER (SS) 45.8 45.4 44.8 43.9 43.1 42.6 41.9 41.1 40.7 38.8 S Call (S) 2439 2417 2380 2342 2267 2172 2080 1954 1826 1520 Recall (S) 37.7 37.6 37.5 37.1 36.2 35.1 33.9 32 30.4 25.7 Precis (S) 79.6 80.2 81.2 81.6 82.4 83.6 84.2 85.1 86.2 88.3 MIER (S) 25.3 24.8 23.9 23.2 22 21 20 19 18.1 15.6 M Call (M) 676 674 672 667 653 630 604 570 528 436 Recall (M) 27.6 27.5 27.5 27.3 26.8 25.9 24.8 23.6 21.9 18.1 Precis (M) 94.6 94.8 94.7 94.9 95.2 95.2 95.1 95.7 95.8 96.3 MIER (M) 9.4 9.3 9.2 8.9 8.5 8.7 8.4 7.8 7.7 5.7 L Call (L) 27 26 26 26 26 24 24 21 21 19 Recall (L) 11.4 10.4 10.4 10.4 10.4 9.3 8.3 7.2 7.2 7.2 Precis (L) 37 34.6 34.6 34.6 34.6 33.3 33.3 33.3 33.3 36.8 MIER (L) 33.3 34.6 34.6 34.6 34.6 29.1 29.1 19 19 15.7 DUP All Call (A) 324 322 313 301 284 260 237 216 187 138 Recall (A) 5.9 5.9 5.8 5.5 5.3 4.7 4.3 3.9 3.4 2.4 Precis (A) 50 50.3 50.4 50.1 50.3 48.4 48.5 48.1 48.6 47.1 MIER (A) 45.6 45.3 45 44.1 42.6 41.5 40.5 41.6 42.2 43.4 S Call (S) 224 222 213 201 185 164 144 126 114 83 Recall (S) 14.6 14.6 14.3 13.6 12.9 11.4 10.3 9.2 8.5 6.1 Precis (S) 63.8 64.4 65.2 65.6 67 65.8 67.3 68.2 69.2 67.4 MIER (S) 56.2 55.8 55.8 55.2 53.5 52.4 51.3 53.9 51.7 50.6 M Call (M) 85 85 85 85 84 81 79 77 63 46 Recall (M) 1.1 1.1 1.1 1.1 1.1 1 1 1 0.7 0.5 Precis (M) 21.1 21.1 21.1 21.1 21.4 20.9 21.5 22 19 19.5 MIER (M) 23.5 23.5 23.5 23.5 23.8 24.6 25.3 25.9 28.5 34.7 L Call (L) 15 15 15 15 15 15 14 13 10 9 Recall (L) 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0 0 Precis (L) 6.6 6.6 6.6 6.6 6.6 6.6 7.1 7.6 0 0 MIER (L) 13.3 13.3 13.3 13.3 13.3 13.3 14.2 15.3 20 22.2 INS All Call (A) 1819 1816 1809 1803 1782 1754 1727 1680 1641 1538 Recall (A) 11 11 11 10.9 10.8 10.5 10.4 10.1 9.9 9.3 Precis (A) 77.7 77.7 77.8 77.9 78.1 78.5 78.8 79.4 79.7 80.8 MIER (A) 55.5 55.5 55.5 55.4 55.2 54.7 54.7 54.5 54.2 53.5 INV All Call (A) 180 180 178 175 173 166 158 148 137 120 Recall (A) 22.2 22.2 21.8 21.5 21.1 19.7 19.7 18.6 17.1 15.3 Precis (A) 33.8 33.8 33.7 33.7 33.5 32.5 34.1 34.4 34.3 35 MIER (A) 16.6 16.6 16.2 16 15.6 15 15.1 14.8 14.5 13.3

There are significant differences in Recall and Precision values. do you see something erroneous for this difference ?

Tried on few other tools and obtained similar diffrences for Real dataset4 but not for SIM-A.

can you help me to understand and fix this ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stat-lab/EvalSVcallers/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIBV3GI2JJLU6TGRL6WU6DSJKV7PANCNFSM4SFRZOQQ.

Mkddb commented 3 years ago

Thanks for highlighting it.

Yes, i read the manuscript carefully again. it's mentioned "Table S3. Whole genome sequencing reads used in this study" and "Genome coverage for NA12878 (data4) *4 is 30x". hope, it's the same thing being highlighted here.

But, i am not able to understand this "We have not used all the SRR1910373 reads but used only 30x coverage reads from SRR1910373" does it really mean only 30x coverage reads from SRR1910373 Fastq files or the whole genome coverage(which is actually an overall average) is 30. also, Are you using these files only ? ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373_1.fastq.bz2 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373_2.fastq.bz2

what about 3rd Fastq file ? ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373.fastq.bz2

Can you please ellaborate as how to use only 30x coverage reads from above data. It will be really helpful if these script for fastq to bam conversion or the final *bam files can be shared for NA12878 (data4).

This gives me additional doubt how would one obtain RSS of 2 to 12, if only the 30x coverage reads were used for the alignment ?

stat-lab commented 3 years ago

SRR1910373 files contain 251 bp paired-end reads. So the number of read pairs with 30x coverage is about 3,000,000,000 x 30 / (251 x 2). The calculated numbers of read pairs can be extracted from SRR1910373_1.fastq.bz2 and SRR1910373_2.fastq.bz2.

2020/10/07 16:02、Mkddb notifications@github.comのメール:

Thanks for highlighting it.

Yes, i read the manuscript carefully again. it's mentioned "Table S3. Whole genome sequencing reads used in this study" and "Genome coverage for NA12878 (data4) *4 is 30x". hope, it's the same thing being highlighted here.

But, i am not able to understand this "We have not used all the SRR1910373 reads but used only 30x coverage reads from SRR1910373" does it really mean only 30x coverage reads from SRR1910373 Fastq files or the whole genome coverage(which is actually an overall average) is 30. also, Are you using these files only ? ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373_1.fastq.bz2 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373_2.fastq.bz2

what about 3rd Fastq file ? ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA245/SRA245681/SRX950495/SRR1910373.fastq.bz2

Can you please ellaborate as how to use only 30x coverage reads from above data. It will be really helpful if these script for fastq to bam conversion or the final *bam files can be shared for NA12878 (data4).

This gives me additional doubt how would one obtain RSS of 2 to 12, if only the 30x coverage reads were used for the alignment ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stat-lab/EvalSVcallers/issues/13#issuecomment-704738631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIBV3DWZPOJ6NRSOCD5Z5DSJQG73ANCNFSM4SFRZOQQ.

Mkddb commented 3 years ago

Are these commands correct to obtain the same or i need to apply additional parameters.

bzip2 -d SRR1910373_1.fastq.bz2 bzip2 -d SRR1910373_2.fastq.bz2

bwa mem hs37d5.fa SRR1910373_1.fastq SRR1910373_2.fastq > SRR1910373.sam

samtools view -S -b SRR1910373.sam > SRR1910373.bam

samtools sort SRR1910373.bam -o SRR1910373_sorted.bam

samtools index SRR1910373_sorted.bam.bai