Closed gbggrant closed 4 years ago
Is there a documentation update that goes with this change? Otherwise, v exciting to see the flexibility of input types!
@alimanfoo I have run two pairs of tests for this. One (AV0148-C) with the small data files provided by @tnguyensanger (a cram and a pair of fastqs) and the other (AB0252-C) with a large bam and the corresponding pair of fastqs). Outputs of samtools idxstats and flagstat are below. They look okay to me - let me know if you have any concerns:
Short read alignment pipeline Small cram: AV0148-C faab4017-18fe-4a78-ba26-96a610ec666a
Idxstats:
2R 61545105 13366 4104
3R 53200684 9177 2829
2L 49364325 8838 2649
UNKN 42389979 4551 1281
3L 41963435 9289 2419
X 24393108 8056 2166
Y_unplaced 237045 25 12
Mt 15363 46 9
* 0 0 23022
Flagstat:
91839 + 0 in total (QC-passed reads + QC-failed reads)
11513 + 0 secondary
0 + 0 supplementary
4078 + 0 duplicates
53348 + 0 mapped (58.09% : N/A)
80326 + 0 paired in sequencing
40163 + 0 read1
40163 + 0 read2
3230 + 0 properly paired (4.02% : N/A)
26366 + 0 with itself and mate mapped
15469 + 0 singletons (19.26% : N/A)
18962 + 0 with mate mapped to a different chr
3537 + 0 with mate mapped to a different chr (mapQ>=5)
Small fastqs: AV0148-C 5d511047-bed2-4c8d-9507-54c5439144e7
Idxstats:
2R 61545105 13377 4023
3R 53200684 9086 2825
2L 49364325 8919 2719
UNKN 42389979 4523 1294
3L 41963435 9231 2424
X 24393108 8136 2168
Y_unplaced 237045 21 8
Mt 15363 46 8
* 0 0 23022
Flagstat:
91830 + 0 in total (QC-passed reads + QC-failed reads)
11504 + 0 secondary
0 + 0 supplementary
3993 + 0 duplicates
53339 + 0 mapped (58.08% : N/A)
80326 + 0 paired in sequencing
40163 + 0 read1
40163 + 0 read2
3232 + 0 properly paired (4.02% : N/A)
26366 + 0 with itself and mate mapped
15469 + 0 singletons (19.26% : N/A)
18862 + 0 with mate mapped to a different chr
3518 + 0 with mate mapped to a different chr (mapQ>=5)
Large bam: AB0252-C 77f099d1-425b-4021-8dcc-77f4760d23d1
IdxStats:
2R 61545105 10376172 47262
3R 53200684 9047935 48973
2L 49364325 8584117 51320
UNKN 42389979 6957095 24980
3L 41963435 7154619 41619
X 24393108 4673674 47236
Y_unplaced 237045 39852 549
Mt 15363 429553 167
* 0 0 108628
FlagStat:
47633751 + 0 in total (QC-passed reads + QC-failed reads)
1986127 + 0 secondary
0 + 0 supplementary
638900 + 0 duplicates
47263017 + 0 mapped (99.22% : N/A)
45647624 + 0 paired in sequencing
22823812 + 0 read1
22823812 + 0 read2
41954304 + 0 properly paired (91.91% : N/A)
45014784 + 0 with itself and mate mapped
262106 + 0 singletons (0.57% : N/A)
2051380 + 0 with mate mapped to a different chr
932381 + 0 with mate mapped to a different chr (mapQ>=5)
Large fastqs: AB0252-C 55ca8744-0738-48c1-830d-40b677f2dcad
IdxStats:
2R 61545105 10375873 47355
3R 53200684 9046717 49025
2L 49364325 8584887 51411
UNKN 42389979 6957552 24774
3L 41963435 7155320 41603
X 24393108 4673483 47190
Y_unplaced 237045 39865 585
Mt 15363 429562 165
* 0 0 108628
Flagstat:
47633995 + 0 in total (QC-passed reads + QC-failed reads)
1986371 + 0 secondary
0 + 0 supplementary
639268 + 0 duplicates
47263259 + 0 mapped (99.22% : N/A)
45647624 + 0 paired in sequencing
22823812 + 0 read1
22823812 + 0 read2
41953716 + 0 properly paired (91.91% : N/A)
45014780 + 0 with itself and mate mapped
262108 + 0 singletons (0.57% : N/A)
2051784 + 0 with mate mapped to a different chr
932352 + 0 with mate mapped to a different chr (mapQ>=5)
@alimanfoo I have run two pairs of tests for this. One (AV0148-C) with the small data files provided by @tnguyensanger (a cram and a pair of fastqs) and the other (AB0252-C) with a large bam and the corresponding pair of fastqs). Outputs of samtools idxstats and flagstat are below. They look okay to me - let me know if you have any concerns
Thanks @gbggrant, just to say these look good to me too.
Add support for cram, bam, and fastq inputs to the ShortReadAlignment.wdl