About SNP in Intergenic Region

zhqingit / giremi

GIREMI is a method that can identify RNA editing sites using one RNA-seq data set without requiring genome sequence data.

42 stars 15 forks source link

About SNP in Intergenic Region #3

Open ghost opened 8 years ago

ghost commented 8 years ago

It seems that the software cannot perform MI test (first step) on the SNP in the intergenic region, but the methods in the first step is gene-independent based on my understanding. Some species, such as pig and chicken, may not have good gene annotation as human, so many de novo SNPs can be detected in regions without annotated genes, which are showed as intergenic snps in my input files. Could you please fix it?

zhqingit commented 8 years ago

Hi,

MI test doesn't depend on the locating region of the SNVs. If there are reads covering two or more SNVs, giremi can do the MI calculation.

2016-01-28 13:43 GMT-07:00 weixuanfu notifications@github.com:

It seems that the software cannot perform MI test (first step) on the SNP in the intergenic region, but the methods in the first step is gene-independent based on my understanding Some species, such as pig and chicken, may not have good gene annotation as human, so many de novo SNPs can be detected in regions without annotated genes, which are showed as intergenic snps in my input files Could you please fix it?

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3.

ghost commented 8 years ago

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

zhqingit commented 8 years ago

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.

ghost commented 8 years ago

Hi, Dr. Zhang,

Thanks for your quick reply. The attached zip file contains the SNV list and output file.

Weixuan

ghost commented 8 years ago

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102.

zhqingit commented 8 years ago

Hi Weixuan,

Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.

Best, Qing

2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.

ghost commented 8 years ago

Dear Dr. Zhang,

Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?

Thanks,

Weixuan

On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.

Best, Qing

2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374.

zhqingit commented 8 years ago

Hi Weixuan,

Could you show me your command?

Best, Qing

2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:

Dear Dr. Zhang,

Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?

Thanks,

Weixuan

On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.

Best, Qing

2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.

ghost commented 8 years ago

Hi, Qing

Here is my command:

giremi \ -f $REF_P/galGal4_vcf_chr.fa \ -l 40707_SNP_SNV_list.txt \ -o GIREMI_40707_new.txt \ -m 10 -p 1 -s 1 \ 40707_sorted_RG_MD_SP_RA_RC.bam

Also I just found the log when running the program had much physical positions of these SNPs. You may download it from the link below.

GIREMI_FT_m10_allSNP_log.txt

Regards,

Weixuan

On Feb 3, 2016, at 11:38 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you show me your command?

Best, Qing

2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:

Dear Dr. Zhang,

Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?

Thanks,

Weixuan

On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.

Best, Qing

2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179634362.

zhqingit commented 8 years ago

Hi Weixuan,

I noticed you used -s 1 in your command, which means you specify that the first read should be from the sense strand. For the SNVs in the intergenic region with '#' strand, giremi can't decide which read should be used, so it generate the -1. Two ways to solve it: you can use -s 0 or assign a real strand information ('+' or '-') to the intergenic SNVs. Let me know freely if you have other questions.

Best, Qing

2016-02-03 21:42 GMT-07:00 weixuanfu notifications@github.com:

Hi, Qing

Here is my command:

giremi \ -f $REF_P/galGal4_vcf_chr.fa \ -l 40707_SNP_SNV_list.txt \ -o GIREMI_40707_new.txt \ -m 10 -p 1 -s 1 \ 40707_sorted_RG_MD_SP_RA_RC.bam,

Regards,

Weixuan

On Feb 3, 2016, at 11:38 PM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Could you show me your command?

Best, Qing

2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:

Dear Dr. Zhang,

Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?

Thanks,

Weixuan

On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:

Hi Weixuan,

Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.

Best, Qing

2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.

Thanks,

Weixuan

On Jan 29, 2016, at 11:11 PM, zhqingit <notifications@github.com

wrote:

Hi Weixuan,

Could you send me your full list of SNVs and the output file? I can take a look.

Best, Qing

2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:

Hi, Dr. Zhang,

Thanks for your reply.

I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:

Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5

Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411

Could you please help me solve this issue? Thanks.

Weixuan

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102 .

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401>.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.

— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179634362>.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179634882.

ghost commented 8 years ago

Thanks for your reply. I retested it with -s 0. And this issue did not happen.

Weixuan

ma-diroma commented 6 years ago

Hi,

I am facing the same issue, but your solution

you can use -s 0 or assign a real strand information ('+' or '-') to the intergenic SNVs

does not work in my case.

[mpileup] 1 samples in 1 input files
Calculating the MI values...
meanMI:0.593674 sdMI:0.255750
Estimating the allelic ratios...
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
...

My command line was

giremi -f reference.fasta -l SNVs.list -o giremi.out -s 2 file.bam

then relaunched with -s 0, but the error remains. # was then replaced with + for intergenic regions, but it still did not work. Could you please help me? Please find attached my lists (one with # and the other one with +) and my output files (using -s 0 and -s 2, respectively) giremi.zip .

Thanks, Maria Angela