Open ghost opened 8 years ago
Hi,
MI test doesn't depend on the locating region of the SNVs. If there are reads covering two or more SNVs, giremi can do the MI calculation.
2016-01-28 13:43 GMT-07:00 weixuanfu notifications@github.com:
It seems that the software cannot perform MI test (first step) on the SNP in the intergenic region, but the methods in the first step is gene-independent based on my understanding Some species, such as pig and chicken, may not have good gene annotation as human, so many de novo SNPs can be detected in regions without annotated genes, which are showed as intergenic snps in my input files Could you please fix it?
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3.
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.
Hi, Dr. Zhang,
Thanks for your quick reply. The attached zip file contains the SNV list and output file.
Weixuan
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102.
Hi Weixuan,
Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.
Best, Qing
2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.
Dear Dr. Zhang,
Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?
Thanks,
Weixuan
On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.
Best, Qing
2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374.
Hi Weixuan,
Could you show me your command?
Best, Qing
2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:
Dear Dr. Zhang,
Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?
Thanks,
Weixuan
On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.
Best, Qing
2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.
Hi, Qing
Here is my command:
giremi \ -f $REF_P/galGal4_vcf_chr.fa \ -l 40707_SNP_SNV_list.txt \ -o GIREMI_40707_new.txt \ -m 10 -p 1 -s 1 \ 40707_sorted_RG_MD_SP_RA_RC.bam
Also I just found the log when running the program had much physical positions of these SNPs. You may download it from the link below.
Regards,
Weixuan
On Feb 3, 2016, at 11:38 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you show me your command?
Best, Qing
2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:
Dear Dr. Zhang,
Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?
Thanks,
Weixuan
On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.
Best, Qing
2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179634362.
Hi Weixuan,
I noticed you used -s 1 in your command, which means you specify that the first read should be from the sense strand. For the SNVs in the intergenic region with '#' strand, giremi can't decide which read should be used, so it generate the -1. Two ways to solve it: you can use -s 0 or assign a real strand information ('+' or '-') to the intergenic SNVs. Let me know freely if you have other questions.
Best, Qing
2016-02-03 21:42 GMT-07:00 weixuanfu notifications@github.com:
Hi, Qing
Here is my command:
giremi \ -f $REF_P/galGal4_vcf_chr.fa \ -l 40707_SNP_SNV_list.txt \ -o GIREMI_40707_new.txt \ -m 10 -p 1 -s 1 \ 40707_sorted_RG_MD_SP_RA_RC.bam,
Regards,
Weixuan
On Feb 3, 2016, at 11:38 PM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Could you show me your command?
Best, Qing
2016-02-03 7:38 GMT-07:00 weixuanfu notifications@github.com:
Dear Dr. Zhang,
Thanks for your reply and update. With the same dataset, I tested the program that you uploaded last night but the same problem still occurred. I downloaded it from the branch called “master”. Is it the right one? Or could you please double check the program?
Thanks,
Weixuan
On Feb 3, 2016, at 12:15 AM, zhqingit notifications@github.com wrote:
Hi Weixuan,
Thank you very much for your reminding. Yes, I have fixed this bug. Please let me know freely if you find any other problem.
Best, Qing
2016-02-02 7:05 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Just wondering whether you could download the files from Github. If you didn’t get them, please let me know. I can share with you through dropbox links.
Thanks,
Weixuan
On Jan 29, 2016, at 11:11 PM, zhqingit <notifications@github.com
wrote:
Hi Weixuan,
Could you send me your full list of SNVs and the output file? I can take a look.
Best, Qing
2016-01-29 8:15 GMT-07:00 weixuanfu notifications@github.com:
Hi, Dr. Zhang,
Thanks for your reply.
I knew the methods did not depend on the region of SNVs. But it seems that the program (giremi) cannot output the MI and p value from MI test for the SNVs marked as intergenic regions in the SNV list (Inte in column 4 and # in column 6 ). I have done a test about it: With same genome fasta file and bam file, only replace all the "Inte" and "#" with "Unknown" and "+" on column 4 and 6, respectively, in the file containing the list of SNVs, these MI and p value were showed up in output file in the first step (before GLM step) for some of them with high coverage (> 10 reads) and close to other SNVs although error message like "error:Can't find the site from gene in snvs" also showed up when running the program. Also, the allele counts were also different between two outputs. For example:
Below is the first 16 columns (out of 24 columns) in the output of MI step with SNVs list based on manual online (Inte and # in column 4 and 6, respectively) chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar 1 874192 # 1 Inte A T G T 20 29 0.689655 -1 -1 -1 0.5 1 874199 # 1 Inte T C C C 15 24 0.625 -1 -1 -1 0.5 1 874217 # 1 Inte C C C G 14 22 0.636364 -1 -1 -1 0.5 1 874741 # 1 Inte C T T C 13 23 0.565217 -1 -1 -1 0.5 1 874780 # 1 Inte A T C A 16 27 0.592593 -1 -1 -1 0.5 1 874851 # 1 Inte T C G T 15 27 0.555556 -1 -1 -1 0.5 1 874867 # 1 Inte A G T A 17 29 0.586207 -1 -1 -1 0.5 1 1304480 # 1 Inte T A G T 16 22 0.727273 -1 -1 -1 0.5 1 1304609 # 1 Inte A A C G 13 21 0.619048 -1 -1 -1 0.5 1 1304611 # 1 Inte T C A C 12 20 0.6 -1 -1 -1 0.5
Below is the first 16 columns (out of 24 columns) in the output of MI step with the modified SNVs list (Unknown and + in column 4 and 6, respectively) 1 874192 + 1 Unknown A T G T 13 17 0.764706 2 0.614361 0.2857808 0.592411 1 874199 + 1 Unknown T C C C 10 14 0.714286 2 0.60911 0.2675517 0.592411 1 874217 + 1 Unknown C C C G 8 11 0.727273 2 0.618448 0.3003691 0.592411 1 874741 + 1 Unknown C T T C 9 16 0.5625 1 0.683582 0.5606082 0.592411 1 874780 + 1 Unknown A T C A 10 18 0.555556 1 0.683582 0.5606082 0.592411 1 874851 + 1 Unknown T C G T 10 17 0.588235 1 0.666533 0.4902684 0.592411 1 874867 + 1 Unknown A G T A 11 19 0.578947 1 0.666533 0.4902684 0.592411 1 1304480 + 1 Unknown T A G T 6 10 0.6 -1 -1 -1 0.592411 1 1304609 + 1 Unknown A A C G 8 11 0.727273 1 0.549394 0.1075112 0.592411 1 1304611 + 1 Unknown T C A C 8 11 0.727273 1 0.549394 0.1075112 0.592411
Could you please help me solve this issue? Thanks.
Weixuan
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-176803550>.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-177062102 .
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-178586401>.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179017374>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179269016.
— Reply to this email directly or view it on GitHub < https://github.com/zhqingit/giremi/issues/3#issuecomment-179634362>.
— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/3#issuecomment-179634882.
Thanks for your reply. I retested it with -s 0. And this issue did not happen.
Weixuan
Hi,
I am facing the same issue, but your solution
you can use -s 0 or assign a real strand information ('+' or '-') to the intergenic SNVs
does not work in my case.
[mpileup] 1 samples in 1 input files
Calculating the MI values...
meanMI:0.593674 sdMI:0.255750
Estimating the allelic ratios...
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
error:Can't find the site from gene in snvs
...
My command line was
giremi -f reference.fasta -l SNVs.list -o giremi.out -s 2 file.bam
then relaunched with -s 0
, but the error remains.
#
was then replaced with +
for intergenic regions, but it still did not work.
Could you please help me?
Please find attached my lists (one with # and the other one with +) and my output files (using -s 0 and -s 2, respectively)
giremi.zip
.
Thanks, Maria Angela
It seems that the software cannot perform MI test (first step) on the SNP in the intergenic region, but the methods in the first step is gene-independent based on my understanding. Some species, such as pig and chicken, may not have good gene annotation as human, so many de novo SNPs can be detected in regions without annotated genes, which are showed as intergenic snps in my input files. Could you please fix it?