zzhang526 / MosaicHunter

A tool to detect postzygotic single-nucleotide mosaicism from unpaired, trio, or paired samples.
http://mosaichunter.cbi.pku.edu.cn/
MIT License
11 stars 4 forks source link

Ratio NaN% #3

Open PhilPalmer opened 5 years ago

PhilPalmer commented 5 years ago

Hi,

I would like to run the pipeline in paired/trio mode. Is any testdata available?

I have tried myself, however, when I run the pipeline lots of the output produced is empty and the standard output has ratio NaN% for many values.

I think this is similar to issue #1, however, I have removed the chr prefix from my BAM file and am using the FASTA provided as testdata.

To get my BAM file:

When I run the following command:

java -Xmx10G -jar /usr/local/bin/mosaichunter.jar genome -P input_file=son_subsample_18_mapping_sorted_header.bam -P reference_file=demo/hg37_chr18.fa -P mosaic_filter.sex=M -P mosaic_filter.mode=single -P output_dir=son_single_test_out

I get the following output:

Mon Mar 18 15:05:31 UTC 2019 Initializing...
Mon Mar 18 15:05:33 UTC 2019 Reading reference from file: demo/hg37_chr18.fa
Mon Mar 18 15:05:34 UTC 2019 Initializing filters...
Mon Mar 18 15:05:34 UTC 2019 Scanning...
Mon Mar 18 15:05:34 UTC 2019 - Time(s):0 Reads:0 Sites:0/78077248 Progress:0.00%
Mon Mar 18 15:05:44 UTC 2019 - Time(s):9 Reads:563534 Sites:78077248/78077248 Progress:100.00%
filter name                                          pass/all   ratio
base_number_filter                                  9/6077767   0.00%
repetitive_region_filter                                  9/9 100.00%
homopolymers_filter                                       5/9  55.56%
indel_region_filter                                       5/5 100.00%
depth_filter                                              0/5   0.00%
common_site_filter                                        0/0    NaN%
strand_bias_filter                                        0/0    NaN%
within_read_position_filter                               0/0    NaN%
mosaic_filter                                             0/0    NaN%
complete_linkage_filter                                   0/0    NaN%
mosaic_like_filter                                        5/5 100.00%
near_mosaic_filter                                        0/5   0.00%
misaligned_reads_filter                                   0/0    NaN%
clustered_filter                                          0/0    NaN%
final                                                     0/0    NaN%

Do you have any idea what the problem may be and how I can resolve it? Is the problem that the reference FASTA and BAM file do not correspond?

Thanks in advance, any help would be much appreciated

AugustHuang commented 5 years ago

Hi Phil Palmer,

The output suggested that MosaicHunter runs without correctly reading the reads from bam files (only 9 out of 6077767 sites passed the very first base_number_filter). It seems that you are trying to use the reads generated from Complete Genomics. Can you send me a few reads from your input in the SAM format? I doubt whether the read format and flag in Complete Genomics were labeled as the same way as Illumina. Please keep in mind, our pipeline (especially those error filters) was designed for Illumina platforms, therefore we can not guarantee its performance on other platforms.

Best, August

PhilPalmer commented 5 years ago

Hi @AugustHuang,

Thanks for your prompt response.

Do you know where I might be able to find some small test data for trio or paired BAM files?

Here are the first 10 reads from the BAM file I was using:

GS78791-FS3-L05-16:21308201 409 18  10005   0   25M *   0   0   AACCCTAACCCTAACCCCTAACCCT   :<:;4<=<<<=/<;2;:5.556554   GC:Z:9S1G5S4G6S GS:Z:CNCCTTCCCT GQ:Z:<!;:,&555. R2:Z:TTGGCAGTAATTATTCATTNTTTACTTCAA Q2:Z:6666%)66663#6&:;;;<!:;;<;<<5:8 RG:Z:18_mapping_GS78791-FS3-L05_016_sorted
GS78791-FS3-L04-13:19196522 435 18  10010   0   27M =   10447   437 TCACCCTCACCCTCACCCCTCACCCTC :;<<<<===<=<<<<5;;656555554 GC:Z:9S1G7S2G8S GS:Z:CNCCCC GQ:Z:<!;'56 RG:Z:18_mapping_GS78791-FS3-L04_013_sorted
GS78790-FS3-L01-4:25749251  329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    5666666669';;;;;:<;<;<<<<<<7    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:59!;   R2:Z:AACCCTAACCCCCTAACCCNTAACCCTAAC Q2:Z:4556555566:;<<<<<<<!:<==<<:;<: RG:Z:18_mapping_GS78790-FS3-L01_004_sorted
GS78791-FS3-L03-8:27775956  329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    5666655668.<;;;;:<<<<<<;<<<6    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:68!<   R2:Z:AACCCTAACCCCCTAACCCNTAACCCTAAC Q2:Z:4555555555:;;9<;<==!:<<=<=<<<: RG:Z:18_mapping_GS78791-FS3-L03_008_sorted
GS78791-FS3-L03-12:11572173 329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    56666666692;;;;;9<<<<<<=<<<8    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:69!<   R2:Z:ACCCTAACCCCCCTAACCCNCTAACCCTAA Q2:Z:4555655556:;<:<<8==!======<<<: RG:Z:18_mapping_GS78791-FS3-L03_012_sorted
GS78791-FS3-L03-12:26327256 329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    6666666628*:<;;;:<<<<<<<<<;8    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:68!<   R2:Z:ACCCTAACCCACCCTAACCNCTAACCCTAA Q2:Z:45555455558;<<<<<==!=:====<<;: RG:Z:18_mapping_GS78791-FS3-L03_012_sorted
GS78791-FS3-L04-6:4649719   329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    5556555668)<;<;<:<<=<<<===<9    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:48!<   R2:Z:ACCCTAACCCTAACCCTAANCCCTAACCCT Q2:Z:4555555665&;.<<<<=<!=====<<<<: RG:Z:18_mapping_GS78791-FS3-L04_006_sorted
GS78791-FS3-L05-5:5840971   329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    5666665668+<<;;;;<==<<<<==:9    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:48!=   R2:Z:CCTAACCCTACCTAACCCTNAACCCTAACC Q2:Z:4455566655:;2<<<<=<!=:===8<<<: RG:Z:18_mapping_GS78791-FS3-L05_005_sorted
GS78791-FS3-L06-10:22112671 329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    5566666669&;;;;;;<<=<<<<<<<8    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:69!<   R2:Z:ACCCTAACCCCTAACCCTANACCCTAACCC Q2:Z:4555555666:3:8<<<<=!======<<<: RG:Z:18_mapping_GS78791-FS3-L06_010_sorted
GS78791-FS3-L07-2:22480995  329 18  10015   0   28M *   0   0   CTAACCCTAACCCCTAACCCTAACCCTA    6666666669'<;;;;;<<<<<<=<<<8    GC:Z:9S1G8S1G9S GS:Z:AANC   GQ:Z:69!<   R2:Z:AACCCTAACCAACCCTAACNCCTAACCCTA Q2:Z:4555555566:+<<<<<<=!======<<<: RG:Z:18_mapping_GS78791-FS3-L07_002_sorted
AugustHuang commented 5 years ago

Hi Phil Palmer,

You can download the trio data (90X, Illumina platform) from the ftp of 1000 Genomes Project. See the urls listed in the supplementary table 2 of our NAR paper about MosaicHunter.

The first 10 reads you provided all labeled as secondary alignment probably because of the very short read length for Complete Genomics. I also noticed that you down-sampled the input bam file to 10%, which might be another reason that you didn't have enough sites that passed the base_number_filter. I suggested to have at least 50X average depth for the input bam of MosaicHunter. And for trio calling, you should also specify the path for paternal and maternal sequencing data in your command line "-P father_bam_file= -P mother_bam_file=", and change the mode to trio "-P mosaic_filter.mode=trio".

Best, August

PhilPalmer notifications@github.com 于2019年3月19日周二 上午11:29写道:

Hi @AugustHuang https://github.com/AugustHuang,

Thanks for your prompt response.

Do you know where I might be able to find some some testdata for trio or paired BAM files?

Here are the first 10 reads from the BAM file I was using:

GS78791-FS3-L05-16:21308201 409 18 10005 0 25M 0 0 AACCCTAACCCTAACCCCTAACCCT :<:;4<=<<<=/<;2;:5.556554 GC:Z:9S1G5S4G6S GS:Z:CNCCTTCCCT GQ:Z:<!;:,&555. R2:Z:TTGGCAGTAATTATTCATTNTTTACTTCAA Q2:Z:6666%)66663#6&:;;;<!:;;<;<<5:8 RG:Z:18_mapping_GS78791-FS3-L05_016_sorted GS78791-FS3-L04-13:19196522 435 18 10010 0 27M = 10447 437 TCACCCTCACCCTCACCCCTCACCCTC :;<<<<===<=<<<<5;;656555554 GC:Z:9S1G7S2G8S GS:Z:CNCCCC GQ:Z:<!;'56 RG:Z:18_mapping_GS78791-FS3-L04_013_sorted GS78790-FS3-L01-4:25749251 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 5666666669';;;;;:<;<;<<<<<<7 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:59!; R2:Z:AACCCTAACCCCCTAACCCNTAACCCTAAC Q2:Z:4556555566:;<<<<<<<!:<==<<:;<: RG:Z:18_mapping_GS78790-FS3-L01_004_sorted GS78791-FS3-L03-8:27775956 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 5666655668.<;;;;:<<<<<<;<<<6 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:68!< R2:Z:AACCCTAACCCCCTAACCCNTAACCCTAAC Q2:Z:4555555555:;;9<;<==!:<<=<=<<<: RG:Z:18_mapping_GS78791-FS3-L03_008_sorted GS78791-FS3-L03-12:11572173 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 56666666692;;;;;9<<<<<<=<<<8 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:69!< R2:Z:ACCCTAACCCCCCTAACCCNCTAACCCTAA Q2:Z:4555655556:;<:<<8==!======<<<: RG:Z:18_mapping_GS78791-FS3-L03_012_sorted GS78791-FS3-L03-12:26327256 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 6666666628:<;;;:<<<<<<<<<;8 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:68!< R2:Z:ACCCTAACCCACCCTAACCNCTAACCCTAA Q2:Z:45555455558;<<<<<==!=:====<<;: RG:Z:18_mapping_GS78791-FS3-L03_012_sorted GS78791-FS3-L04-6:4649719 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 5556555668)<;<;<:<<=<<<===<9 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:48!< R2:Z:ACCCTAACCCTAACCCTAANCCCTAACCCT Q2:Z:4555555665&;.<<<<=<!=====<<<<: RG:Z:18_mapping_GS78791-FS3-L04_006_sorted GS78791-FS3-L05-5:5840971 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 5666665668+<<;;;;<==<<<<==:9 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:48!= R2:Z:CCTAACCCTACCTAACCCTNAACCCTAACC Q2:Z:4455566655:;2<<<<=<!=:===8<<<: RG:Z:18_mapping_GS78791-FS3-L05_005_sorted GS78791-FS3-L06-10:22112671 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 5566666669&;;;;;;<<=<<<<<<<8 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:69!< R2:Z:ACCCTAACCCCTAACCCTANACCCTAACCC Q2:Z:4555555666:3:8<<<<=!======<<<: RG:Z:18_mapping_GS78791-FS3-L06_010_sorted GS78791-FS3-L07-2:22480995 329 18 10015 0 28M 0 0 CTAACCCTAACCCCTAACCCTAACCCTA 6666666669'<;;;;;<<<<<<=<<<8 GC:Z:9S1G8S1G9S GS:Z:AANC GQ:Z:69!< R2:Z:AACCCTAACCAACCCTAACNCCTAACCCTA Q2:Z:4555555566:+<<<<<<=!======<<<: RG:Z:18_mapping_GS78791-FS3-L07_002_sorted

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzhang526/MosaicHunter/issues/3#issuecomment-474424889, or mute the thread https://github.com/notifications/unsubscribe-auth/AHpVrKeUu1GHGk04R4VKfEv5AQknLaflks5vYQJogaJpZM4b6DEU .

--