Error: "Need genome sequence file header string"

MagdalenaWinklhofer commented 1 year ago

Hi,

I am working with WGBS data and would like to use DMAP2 to investigate the differences in DMA methylation. Since I am working with a non-model organism our genome is not very polished and still work-in-progress (it is not published yet, so I can't download anything from ensembl). I have a .fasta genome file with the whole genome (not separated in chromosomes) and a .gtf annotation file. I have performed the alignment in Bismark and got 12 .bam files (4 samples in 3 groups). I filled the "dmap_basic_params.conf" and "dmap_anova3_params.conf" with the needed information about the directories to find the corresponding files, but I couldn't get the program started. I can see that DMAP2 finds the working directory, the two config files, and the three groups with all my samples, and than I get: "Executing: ./DMAP/src/diffmeth -G ./genome_top100.fasta -W 1000 -t 10 -I 6 -B 40,220 -z -R1 ./1-N1-D14-1_S1_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./2-N2-D15-1_S8_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./3-N3-D16-2_S4_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./4-N7-D36-1_S5_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./5-A1-D17-1_S9_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./6-A2-D18-1_S3_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./7-A4-D26-2_S7_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./8-A7-D37-1_S11_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./9-24R2-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./10-24R3-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./11-24R7-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./12-24R8-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam > ./diffmeth_output_anova3.txt \ Need genome sequence file header string"

I removed sensible information and added spaces to make it more readable.

What does "Need genome sequence file header string" mean? In the "dmap_basic_params.conf" file, I listed the genome "as a single file" with the absolute path.

Thank you for your help!!!

peterstockwell commented 1 year ago

Hi

The genome sequence need to be fasta, I suspect that diffmeth is not finding an appropriate header line. Also the -G diffmeth option should be getting a list of files and appropriate chromosome IDs. If the original fasta sequence file is processed by the dmap_index_build.sh then the file given to the diffmeth -G option should be properly formatted.

Could you please email to me your basic parameter file which would have been the argument to the

dmap_index_build.sh

script.

Could you also email the ./genome_top100.fasta file.

Regards Peter Stockwell

From: Magdalena @.> Sent: Friday, July 28, 2023 10:12 PM To: peterstockwell/DMAP2 @.> Cc: Subscribed @.***> Subject: [peterstockwell/DMAP2] Error: "Need genome sequence file header string" (Issue #1)

Hi,

I am working with WGBS data and would like to use DMAP2 to investigate the differences in DMA methylation. Since I am working with a non-model organism our genome is not very polished and still work-in-progress (it is not published yet, so I can't download anything from ensembl). I have a .fasta genome file with the whole genome (not separated in chromosomes) and a .gtf annotation file. I have performed the alignment in Bismark and got 12 .bam files (4 samples in 3 groups). I filled the "dmap_basic_params.conf" and "dmap_anova3_params.conf" with the needed information about the directories to find the corresponding files, but I couldn't get the program started. I can see that DMAP2 finds the working directory, the two config files, and the three groups with all my samples, and than I get: "Executing: ./DMAP/src/diffmeth -G ./genome_top100.fasta -W 1000 -t 10 -I 6 -B 40,220 -z -R1 ./1-N1-D14-1_S1_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./2-N2-D15-1_S8_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./3-N3-D16-2_S4_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./4-N7-D36-1_S5_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./5-A1-D17-1_S9_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./6-A2-D18-1_S3_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./7-A4-D26-2_S7_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./8-A7-D37-1_S11_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./9-24R2-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./10-24R3-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./11-24R7-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./12-24R8-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam > ./diffmeth_output_anova3.txt Need genome sequence file header string"

I removed sensible information and added spaces to make it more readable.

What does "Need genome sequence file header string" mean? In the "dmap_basic_params.conf" file, I listed the genome "as a single file" with the absolute path.

Thank you for your help!!!

— Reply to this email directly, view it on GitHubhttps://github.com/peterstockwell/DMAP2/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABAZMTNQN3N2EK2CVPTDZBDXSOGAHANCNFSM6AAAAAA23JIYTM. You are receiving this because you are subscribed to this thread.Message ID: @.***>

MagdalenaWinklhofer commented 1 year ago

Hi Peter,

Thank you for your quick response, and sorry for my late response. I was on holiday for the last two weeks.

The good news is that I got it to work!!! Since I did the alignment with Bismark before starting with dmap2, I initially didn’t want to index the genome and perform the alignment in dmap2 again. So my mistake was that I did not format my genome (fasta format) at all for the diffmeth script. But since you wrote that I need to process the genome (fasta format) with the dmap_index_build.sh script, I first indexed the genome, performed the alignment, and then did the diffmeth analysis.

Thank you very much for your help!

Kind regards, Magdalena

From: Peter Stockwell @.> Sent: Sunday, July 30, 2023 5:22 AM To: peterstockwell/DMAP2 @.> Cc: Magdalena Winklhofer @.>; Author @.> Subject: Re: [peterstockwell/DMAP2] Error: "Need genome sequence file header string" (Issue #1)

Hi

The genome sequence need to be fasta, I suspect that diffmeth is not finding an appropriate header line. Also the -G diffmeth option should be getting a list of files and appropriate chromosome IDs. If the original fasta sequence file is processed by the dmap_index_build.sh then the file given to the diffmeth -G option should be properly formatted.

Could you please email to me your basic parameter file which would have been the argument to the

dmap_index_build.sh

script.

Could you also email the ./genome_top100.fasta file.

Regards Peter Stockwell

From: Magdalena @.<mailto:@.>> Sent: Friday, July 28, 2023 10:12 PM To: peterstockwell/DMAP2 @.<mailto:@.>> Cc: Subscribed @.<mailto:@.>> Subject: [peterstockwell/DMAP2] Error: "Need genome sequence file header string" (Issue #1)

Hi,

I am working with WGBS data and would like to use DMAP2 to investigate the differences in DMA methylation. Since I am working with a non-model organism our genome is not very polished and still work-in-progress (it is not published yet, so I can't download anything from ensembl). I have a .fasta genome file with the whole genome (not separated in chromosomes) and a .gtf annotation file. I have performed the alignment in Bismark and got 12 .bam files (4 samples in 3 groups). I filled the "dmap_basic_params.conf" and "dmap_anova3_params.conf" with the needed information about the directories to find the corresponding files, but I couldn't get the program started. I can see that DMAP2 finds the working directory, the two config files, and the three groups with all my samples, and than I get: "Executing: ./DMAP/src/diffmeth -G ./genome_top100.fasta -W 1000 -t 10 -I 6 -B 40,220 -z -R1 ./1-N1-D14-1_S1_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./2-N2-D15-1_S8_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./3-N3-D16-2_S4_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./4-N7-D36-1_S5_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./5-A1-D17-1_S9_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./6-A2-D18-1_S3_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./7-A4-D26-2_S7_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./8-A7-D37-1_S11_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./9-24R2-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./10-24R3-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./11-24R7-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./12-24R8-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam > ./diffmeth_output_anova3.txt Need genome sequence file header string"

I removed sensible information and added spaces to make it more readable.

What does "Need genome sequence file header string" mean? In the "dmap_basic_params.conf" file, I listed the genome "as a single file" with the absolute path.

Thank you for your help!!!

— Reply to this email directly, view it on GitHubhttps://github.com/peterstockwell/DMAP2/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABAZMTNQN3N2EK2CVPTDZBDXSOGAHANCNFSM6AAAAAA23JIYTM. You are receiving this because you are subscribed to this thread.Message ID: @.<mailto:@.>>

— Reply to this email directly, view it on GitHubhttps://github.com/peterstockwell/DMAP2/issues/1#issuecomment-1657005878, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYTF55VPUQYVAJ2FZYFPEKDXSXHM7ANCNFSM6AAAAAA23JIYTM. You are receiving this because you authored the thread.Message ID: @.**@.>>

peterstockwell commented 1 year ago

Hi Magdalena

Thankyou for getting back to me. I'm pleased that you have got DMAP2 to work properly.

You effectively found why I have set the thing up to work through the whole process - previously I and others had experienced precisely the problem you had when the genome chromosome IDs were inconsistent between different parts of the process.

Regards Peter Stockwell

On 16/08/23, 19:22, "Magdalena" @.***> wrote:

Hi Peter,

Thank you for your quick response, and sorry for my late response. I was on holiday for the last two weeks.

The good news is that I got it to work!!! Since I did the alignment with Bismark before starting with dmap2, I initially didn’t want to index the genome and perform the alignment in dmap2 again. So my mistake was that I did not format my genome (fasta format) at all for the diffmeth script. But since you wrote that I need to process the genome (fasta format) with the dmap_index_build.sh script, I first indexed the genome, performed the alignment, and then did the diffmeth analysis.

Thank you very much for your help!

Kind regards, Magdalena

From: Peter Stockwell @.> Sent: Sunday, July 30, 2023 5:22 AM To: peterstockwell/DMAP2 @.> Cc: Magdalena Winklhofer @.>; Author @.> Subject: Re: [peterstockwell/DMAP2] Error: "Need genome sequence file header string" (Issue #1)

Hi

The genome sequence need to be fasta, I suspect that diffmeth is not finding an appropriate header line. Also the -G diffmeth option should be getting a list of files and appropriate chromosome IDs. If the original fasta sequence file is processed by the dmap_index_build.sh then the file given to the diffmeth -G option should be properly formatted.

Could you please email to me your basic parameter file which would have been the argument to the

dmap_index_build.sh

script.

Could you also email the ./genome_top100.fasta file.

Regards Peter Stockwell

From: Magdalena @.<mailto:@.>> Sent: Friday, July 28, 2023 10:12 PM To: peterstockwell/DMAP2 @.<mailto:@.>> Cc: Subscribed @.<mailto:@.>> Subject: [peterstockwell/DMAP2] Error: "Need genome sequence file header string" (Issue #1)

Hi,

I am working with WGBS data and would like to use DMAP2 to investigate the differences in DMA methylation. Since I am working with a non-model organism our genome is not very polished and still work-in-progress (it is not published yet, so I can't download anything from ensembl). I have a .fasta genome file with the whole genome (not separated in chromosomes) and a .gtf annotation file. I have performed the alignment in Bismark and got 12 .bam files (4 samples in 3 groups). I filled the "dmap_basic_params.conf" and "dmap_anova3_params.conf" with the needed information about the directories to find the corresponding files, but I couldn't get the program started. I can see that DMAP2 finds the working directory, the two config files, and the three groups with all my samples, and than I get: "Executing: ./DMAP/src/diffmeth -G ./genome_top100.fasta -W 1000 -t 10 -I 6 -B 40,220 -z -R1 ./1-N1-D14-1_S1_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./2-N2-D15-1_S8_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./3-N3-D16-2_S4_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R1 ./4-N7-D36-1_S5_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./5-A1-D17-1_S9_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./6-A2-D18-1_S3_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./7-A4-D26-2_S7_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R2 ./8-A7-D37-1_S11_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./9-24R2-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./10-24R3-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./11-24R7-D21-2_S6_L001_R1_001_val_1_bismark_bt2_pe.bam -z -R3 ./12-24R8-D22-2_S2_L001_R1_001_val_1_bismark_bt2_pe.bam > ./diffmeth_output_anova3.txt Need genome sequence file header string"

I removed sensible information and added spaces to make it more readable.

What does "Need genome sequence file header string" mean? In the "dmap_basic_params.conf" file, I listed the genome "as a single file" with the absolute path.

Thank you for your help!!!

— Reply to this email directly, view it on GitHubhttps://github.com/peterstockwell/DMAP2/issues/1 https://github.com/peterstockwell/DMAP2/issues/1%3e, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABAZMTNQN3N2EK2CVPTDZBDXSOGAHANCNFSM6AAAAAA23JIYTM https://github.com/notifications/unsubscribe-auth/ABAZMTNQN3N2EK2CVPTDZBDXSOGAHANCNFSM6AAAAAA23JIYTM%3e. You are receiving this because you are subscribed to this thread.Message ID: @.<mailto:@.>>

— Reply to this email directly, view it on GitHubhttps://github.com/peterstockwell/DMAP2/issues/1#issuecomment-1657005878, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYTF55VPUQYVAJ2FZYFPEKDXSXHM7ANCNFSM6AAAAAA23JIYTM https://github.com/notifications/unsubscribe-auth/AYTF55VPUQYVAJ2FZYFPEKDXSXHM7ANCNFSM6AAAAAA23JIYTM%3e. You are receiving this because you authored the thread.Message ID: @.**@.>> — Reply to this email directly, view it on GitHub https://github.com/peterstockwell/DMAP2/issues/1#issuecomment-1680094659, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZMTLGETCGKB7V4HV6CMLXVRYKJANCNFSM6AAAAAA23JIYTM https://github.com/notifications/unsubscribe-auth/ABAZMTLGETCGKB7V4HV6CMLXVRYKJANCNFSM6AAAAAA23JIYTM%3e. You are receiving this because you commented.Message ID: @.**@.>>

peterstockwell / DMAP2

Error: "Need genome sequence file header string" #1