guoweilong / cgmaptools

toolbox for analysing BS-seq data, advance features in SNV, ASM and DMR
https://cgmaptools.github.io
61 stars 26 forks source link

ASM step takes very long on a subsetted bam ~59 million reads #65

Open nmfad opened 9 months ago

nmfad commented 9 months ago

I am running CGMAPTOOLS to get ASM calls from WGBS data on my sample. I ran the CGMAP step to convert to the ATCG format, performed SNV calling (bayes mode) and eventually submitted the job for ASM using the ‘asr’ mode. Everything worked well upto this point. I am seeing at the sample has been processing this step since the past 11+ hours. Do you have an estimate or idea of how long the ASM step would take to run on the size of a VCF and bam file (WGBS) below ?? The log file does not show any error, it only shows “Loading htSNPs ..." And nothing after.

number of Variants in WGBS VCF file = 32,844,420 number of reads in bam file = 59,505,823

For reference, this is a bam file subsetted only to only chromosome X and I am using a powerful compute node using SLURM with 10G of memory to compute the ASM step. Previously I ran the ASM step on my sample using only 300 SNVs as a test and job completed in roughly 16 minutes. Can you please provide an rough estimate of the time it would take ? I am wondering if something is wrong or its just taking long. I am not seeing any output file generated yet. Your insight will be very helpful.

guoweilong commented 8 months ago

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

nmfad commented 8 months ago

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.***>

guoweilong commented 8 months ago

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.***> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.***> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.***> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.***> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.***> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.***> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.***> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.***>

guoweilong commented 8 months ago

I guess so. You may try to align one sample with BS-Seeker2 to see if the issue is fixed.

--

Weilong

At 2023-12-23 16:55:10, "nmfad" @.***> wrote:

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.***> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Sounds good. Am going to give it a quick try now. Do you have a ball park estimate on the time it would take to align roughly ~1.2 billion reads ?? I will be using an HPC environment.


From: Weilong Guo @.> Sent: Saturday, December 23, 2023 3:44 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I guess so. You may try to align one sample with BS-Seeker2 to see if the issue is fixed.

--

Weilong

At 2023-12-23 16:55:10, "nmfad" @.***> wrote:

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.***> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868254835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4AD6ZGIBSCLZKRFTATYK2RYJAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2TIOBTGU. You are receiving this because you authored the thread.Message ID: @.***>

nmfad commented 8 months ago

Hi Weilong,

Based on your suggestion, I am re-doing alignment to subsetted reads for chromosome X only using BS-seeker2. I supplied it with 10G of memory and I have its roughly ~50 million PE reads per sample. I used the default bowtie2 configurations in the BSeeker2 alignment which I believe will submit the alignment using 2 processes or 4 threads. The job (for 1 sample) has been running for over 38 hours in total. These were submitted on 24th early morning (central USA time).

How long do you anticipate BS-seeker2 alignment will take to complete ? I believe the library I have is a directional library. I am seeing in the log file that 2 jobs were submitted one for the watson strand and one for the crick strand and those completed in about ~24 hours. The next set of 2 jobs were submitted last night and seem to be ongoing over 15 hours.

Do these jobs take roughly 3 days/4 days ? My time limit on these jobs for 200 hours for one sample after which the jobs will be automatically killed. Do you anticipate these will complete before 200 hours in your experience. Just looking for ballpark estimate on the time. I am using an HPC compute environment.

Best Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 3:45 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I guess so. You may try to align one sample with BS-Seeker2 to see if the issue is fixed.

--

Weilong

At 2023-12-23 16:55:10, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.<mailto:@.>> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868254835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4AD6ZGIBSCLZKRFTATYK2RYJAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2TIOBTGU. You are receiving this because you authored the thread.Message ID: @.**@.>>

guoweilong commented 8 months ago

Here you can find suggestions for speeding up the run of BS-Seeker2.

https://github.com/BSSeeker/BSseeker2#1-performance

--

Weilong

At 2023-12-26 07:57:43, "nmfad" @.***> wrote:

Hi Weilong,

Based on your suggestion, I am re-doing alignment to subsetted reads for chromosome X only using BS-seeker2. I supplied it with 10G of memory and I have its roughly ~50 million PE reads per sample. I used the default bowtie2 configurations in the BSeeker2 alignment which I believe will submit the alignment using 2 processes or 4 threads. The job (for 1 sample) has been running for over 38 hours in total. These were submitted on 24th early morning (central USA time).

How long do you anticipate BS-seeker2 alignment will take to complete ? I believe the library I have is a directional library. I am seeing in the log file that 2 jobs were submitted one for the watson strand and one for the crick strand and those completed in about ~24 hours. The next set of 2 jobs were submitted last night and seem to be ongoing over 15 hours.

Do these jobs take roughly 3 days/4 days ? My time limit on these jobs for 200 hours for one sample after which the jobs will be automatically killed. Do you anticipate these will complete before 200 hours in your experience. Just looking for ballpark estimate on the time. I am using an HPC compute environment.

Best Numrah

From: Weilong Guo @.> Sent: Saturday, December 23, 2023 3:45 AM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I guess so. You may try to align one sample with BS-Seeker2 to see if the issue is fixed.

--

Weilong

At 2023-12-23 16:55:10, "nmfad" @.<mailto:@.>> wrote:

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.<mailto:@.>> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868254835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4AD6ZGIBSCLZKRFTATYK2RYJAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2TIOBTGU. You are receiving this because you authored the thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

nmfad commented 8 months ago

Thank you Weilong for your response to this. I am trying to trace the progress using the log file and intermediate files generated by bs_seeker2-align.py. (I have not split the fastq.gz files into multiple files assuming its been over 72 hours since my original jobs were submitted and the alignment would complete some time soon).

Inside the tmp directory, here are the following files I see

  1. I am seeing that R1 and R2 fastq.gz files were split into into multiple smaller files (29 fastq.gz files each for R1 and R2 with 1 million reads each )
  2. Then I see that there are 4 pairs of files (total = 8) generated so far in the sam format such as
    • C_C2T_fr_m4.mapping.tmp-7379570 and W_C2T_fr_m4.mapping.tmp-7379570 ( I am guessing one file for R1 and the other for R2 with roughly 2 million reads each)
    • Each of these files are in the sam format and have roughly 2 million reads each.
  3. How do I track how much of the alignment is remaining ?? meaning how do I know how many steps are remaining before the alignment can complete ?? In the BSeeker2 log file I am seeing 8 instances of bowtie2 launched and finished corresponding to the 8 files above. Will there be more such sam files generated ? Is there a way for me to know how much of alignment steps is remaining before the job completes ?

Thanks again for sharing all your insight this far.

Best Numrah

From: Weilong Guo @.> Sent: Monday, December 25, 2023 7:47 PM To: guoweilong/cgmaptools @.> Cc: Fadra, Numrah @.>; Author @.> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Here you can find suggestions for speeding up the run of BS-Seeker2.

https://github.com/BSSeeker/BSseeker2#1-performance

--

Weilong

At 2023-12-26 07:57:43, "nmfad" @.<mailto:@.>> wrote:

Hi Weilong,

Based on your suggestion, I am re-doing alignment to subsetted reads for chromosome X only using BS-seeker2. I supplied it with 10G of memory and I have its roughly ~50 million PE reads per sample. I used the default bowtie2 configurations in the BSeeker2 alignment which I believe will submit the alignment using 2 processes or 4 threads. The job (for 1 sample) has been running for over 38 hours in total. These were submitted on 24th early morning (central USA time).

How long do you anticipate BS-seeker2 alignment will take to complete ? I believe the library I have is a directional library. I am seeing in the log file that 2 jobs were submitted one for the watson strand and one for the crick strand and those completed in about ~24 hours. The next set of 2 jobs were submitted last night and seem to be ongoing over 15 hours.

Do these jobs take roughly 3 days/4 days ? My time limit on these jobs for 200 hours for one sample after which the jobs will be automatically killed. Do you anticipate these will complete before 200 hours in your experience. Just looking for ballpark estimate on the time. I am using an HPC compute environment.

Best Numrah

From: Weilong Guo @.<mailto:@.>> Sent: Saturday, December 23, 2023 3:45 AM To: guoweilong/cgmaptools @.<mailto:@.>> Cc: Fadra, Numrah @.<mailto:@.>>; Author @.<mailto:@.>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I guess so. You may try to align one sample with BS-Seeker2 to see if the issue is fixed.

--

Weilong

At 2023-12-23 16:55:10, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Thank you Weilong for patiently answering my queries. This is very helpful to know.

I had one final question in the ASM calculation step. I do see the SNV calls are located at the CG sites in the bam and totoal read numbers in the asm output match those in the bam when i visualize it in IGV to cross check. Do you think an incorrect ATCG Map file could impact the detection read of allele1 and allele2 and the corresponding methylation levels estimated for both?


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Saturday, December 23, 2023 2:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

I have no idea about BSMAP or Bismark's BAM files. What I know is, their BAM formats were defined slightly different with the BAM file that we defined for BS-Seeker2. I developed CGmapTools, which is compatible with BS-Seeker2's alignment result, but could not guarente it would be compatible with BAM files generated by other tools. Thus I guess that used "cgmaptools convert" to generate ATCGmap from BAM files that were not generated by BS-Seeker2, is the major reason that would cause the unexpected high SNPs.

Weilong

At 2023-12-23 16:24:44, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> wrote:

Hi Weilong,

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling. - This is the part I would like to understand better, what exactly do I need to change in the process ?? I used the CGmapFromBAM command on the BSMAP aligned bam files and generated the .ATCGmap.gz & .CGmap.gz files. I used the latest version of CGMAP from the website. The BSMAP version used on these files was v2.73.

-- Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Sent: Saturday, December 23, 2023 2:10 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

"cgmaptools snv" acturally relies on correct ATCGmap file. "cgmaptools convert" could convert correctly from BAM to ATCGmap file for BS-Seeker2's output.

If you could convert BSMAP aligned BAM file to correct ATCGmap files, you can then use cgmaptools for SNP callling.

--

Weilong

At 2023-12-23 13:18:05, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> wrote:

Thank you Weilong for all your inputs thus far. I believe that explains it. I will definitely consider this option.

My hiccup is I am currently under a time crunch as I have already aligned >200 samples using BSMAP. Do you have any recommendations for filtering ASM results using BSMAP aligned bams for cleanest ASM calls ? I already applied thresholds of (-d = 10, #min number of reads for each allele linked site to call ASS) , the recommended default was only 1 read. Anything else you would suggest ?

-- Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Sent: Friday, December 22, 2023 10:58 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Got it.

Bismark and BSmap defined different with BAM formats with BS-Seeker2.

Thus you would got many so-called SNVs by feeding with bam files generated by Bismark and BSmap. I would suggest you to run BS-Seeeker2 for alignment, which is compatibable with CGmapTools.

Weillong

At 2023-12-23 12:52:21, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> wrote:

Hi Weilong,

Yes the autosomes also have roughly the same number of SNVs. The sample I am analyzing is female. Thank you for confirming point #2 below. I am working with very high coverage WGBS data.

The samples are aligned with BSMAP. I understand your tutorial mentioned the generation of ATGC bam format may not generate perfectly with BSMAP and BISMARCK is preferable, but I was able to run the BSMAP aligned bams through the ATCG map format conversion without any issues.

Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Sent: Friday, December 22, 2023 10:48 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

(1) If you find the autochromosome is OK with much fewer SNVs deteced, you need to check whehter it is specific issue for chromosome X. If you find the autochromosome also reported with lots of SNVs, similar with chrX, you may need to check the commands or reference genomes. By the way, is the sample you used for analysis from male or female?

(2) Acturally, if the command and reference genome is OK, and the coverage is high enough, "cgmaptools snp" will detect the heterozygous SNVs with high accuracy. You need not to remove T>C and A>G sites.

Weilong

At 2023-12-23 12:30:26, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> wrote:

Hi Weilong ,

See my response to our questions below

(1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? - Yes that is correct. This seems to be the case

(2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. - Are you suggesting that I filter the SNV VCF files to include only C>T sites and remove all G>A variants ??

(3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites - Thank you, this is helpful to know. The filters I applied on the SNV include all het and PASS variants only and yet they are very many in number. I am analyzing WGBS data.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Sent: Friday, December 22, 2023 9:04 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

Hi Humrah,

It's weird that you found the chromosome X was called with ~32.84 milllion SNVs. (1) Did you also found similar issue reported on other chromosome (especially for autochromosome)? (2) You can check whether the called SNP (output file) are filtered or not. If not, you may need to filter the sites by yourself. (3) For ASM analysis, it is suggested to feed with high-quality heterozygous SNV sites.

Best, Weilong

At 2023-12-23 06:35:43, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> wrote:

Thank you Weilong.

On average there roughly 5-10 million SNVs in the human genome.

Using the tool on a chromosome X bam file, I have roughly ~32.84 million SNVs called from the ATGCmap file using the bayes mode and bayes-dynamicP options. I further filtered the VCF file for depth> 10 and have 32.82 million SNVs. Do you suggest that I retain this many number of SNVs as they are required for sufficient ASM calls in a given gene region and filter the ASR/ASS calls in the final output by depth or lower pvalue? OR do you think I will incur a lot of erroneous false positive calls in the data if I retain this many SNVs prior to the ASM calling step ?

I ve been following your tutorial guide online as much as possible for my possible on ASM data.

Thank you again for your insight.

Best Numrah

From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Sent: Friday, December 22, 2023 4:34 AM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

The possibility of C-to-T conversion by bisulfite sequencing is considered. You can try to read the corresponding sites of ATCGmap file for the called SNP sites, to see wheather the calling meet with your expectation.

Moreover, select the called SNP sites with strict threshold for p-value is also suggested.

Weilong

At 2023-12-22 14:24:20, "nmfad" @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>> wrote:

Hi Weilong,

The WGBS alignment file is already subsetted to the X chromosome only. Currently am running it gene by gene to make it efficient.

I had another question. I am using paired end data, and the VCF file consists of many (more than expected) G>A SNVs. Example there are almost 32 million SNVs detected along the X chromosome even after applying quality thresholds. Could these possibly just be C and methylation cytosines (T) from the complementary strand? Does the ASM computation take into account that a lot of these may not be SNVs or is there a way to deal with it when computing ASM across a region or within a sample ??

Thank you for any insight you can provide.

Numrah


From: Weilong Guo @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>> Sent: Thursday, December 21, 2023 11:36 PM To: guoweilong/cgmaptools @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>> Cc: Fadra, Numrah @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>>; Author @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>> Subject: [EXTERNAL] Re: [guoweilong/cgmaptools] ASM step takes very long on a subsetted bam ~59 million reads (Issue #65)

What about split the huge VCF and bam files into small ones (by chromosomes or by regions)?

Weilong

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867263366, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4ECMIKCXTGO4FBEICDYKUL55AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGI3DGMZWGY. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1867520866, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FCEGLSH3UVQAT7DSTYKVOXXAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGUZDAOBWGY. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868183863, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4EX37IRRVVAEWBGREDYKZC3RAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGE4DGOBWGM. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868204941, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F5CTNFIQCX7BB7HCLYKZO6LAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDIOJUGE. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868206698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4F6VDVRXUY5F4IELXDYKZQGPAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIYDMNRZHA. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***<mailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***%3cmailto:***@***.***>>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868237737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4FAFMIQBYPFSSHMTVDYK2GW5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGIZTONZTG4. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***<mailto:***@***.******@***.***%3cmailto:***@***.******@***.***>>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868242047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4D3NP6KF3LRAIXZPADYK2JQLAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2DEMBUG4. You are receiving this because you authored the thread.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.mailto:***@***.***%3cmailto:***@***.***>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1868254835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4AD6ZGIBSCLZKRFTATYK2RYJAVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI2TIOBTGU. You are receiving this because you authored the thread.Message ID: @.**@.mailto:***@***.******@***.***>>

- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.<mailto:@.>>

- Reply to this email directly, view it on GitHubhttps://github.com/guoweilong/cgmaptools/issues/65#issuecomment-1869188828, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD4KG4DBP3VM65RC2BAU7DDYLIUA5AVCNFSM6AAAAABAQI5HHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZGE4DQOBSHA. You are receiving this because you authored the thread.Message ID: @.**@.>>