Open malemay opened 6 months ago
Hi Davide,
My first question is: have you checked (for example using samtools view
) that the PV tags and associated values were correctly added by add_pvalues
before going any further?
To process the output of add_pvalues
in downstream analyses, I suggest that you have a look at the gwask
R package that I developed for that purpose (https://github.com/malemay/gwask). In particular, the function format_kmer_gwas
should be what you look for. It will do some filtering on the input k-mers, which I believe should work well for most purposes, but to be entirely sure that it fits your needs I recommend that you look at the function's code.
gwask
also includes functions for Manhattan plots, among other things. There is no vignette but the functions are well documented. Most graphical functions return grid graphical objects (so-called grobs
) which can be plotted with grid::grid.draw
.
If you have any issues with the package, please ask in that package's respository.
Hi Marc-André, thank you for opening a new issue. Below the first 2 lines of a test bam from my project:
samtools view wb001_assoc_reads_pval.bam | head
ERR9849535.4917933 163 chr1H 7598571 25 11S24M66S = 7598571 24 CATGATTACTTTTTTAATACAATTTGTACATTTTTTATGCACGTTGAATGATTTTCAAATACATTATGAACATTTTTCAGATACATGTTTGATTTTTAGAT BBBFFFFBFFFFFFFIFFFFFFFFIIIIFFIIFIFIFFFFFFFIIIIFFI<BFFIIBFFFFIFFIIIIFFFFFFFFFFFFBFFFFBBFFFFFBBBBFFB<B NM:i:0 MD:Z:24 MC:Z:15S24M62S AS:i:24 XS:i:19 RG:Z:prova KM:Z:GTACATTTTTTATGCACGTTGAATGATTTTC,TGTACATTTTTTATGCACGTTGAATGATTTT,TTGTACATTTTTTATGCACGTTGAATGATTT,TTTGTACATTTTTTATGCACGTTGAATGATT,ATTTGTACATTTTTTATGCACGTTGAATGAT,AATTTGTACATTTTTTATGCACGTTGAATGA,CAATTTGTACATTTTTTATGCACGTTGAATG,ACAATTTGTACATTTTTTATGCACGTTGAAT,TACAATTTGTACATTTTTTATGCACGTTGAA,ATACAATTTGTACATTTTTTATGCACGTTGA,AATACAATTTGTACATTTTTTATGCACGTTG PV:Z:3.955763e-08,3.955763e-08,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07
ERR9849535.3850613 177 chr1H 7598571 25 16S24M61S chr2H_p1 23770471 0 CAATGCATGATTACTTTTTTAATACAATTTGTACATTTTTTATGCACGTTGAATGATTTTCAAATACATTATGAACATTTTTCAGATACATGTTTGATTTT BFFBBFFFFFBBBFFFFFFBFFFFFIIIIIFFFIIIIIIIIIIIIFFFFFFIFIFIIIIIIIIIIIIIIFIIIIIIIIIIIIIFFFFFFFFFFFFFFFBBB NM:i:0 MD:Z:24 MC:Z:67S34M AS:i:24 XS:i:19 RG:Z:prova KM:Z:GTACATTTTTTATGCACGTTGAATGATTTTC,TGTACATTTTTTATGCACGTTGAATGATTTT,TTGTACATTTTTTATGCACGTTGAATGATTT,TTTGTACATTTTTTATGCACGTTGAATGATT,ATTTGTACATTTTTTATGCACGTTGAATGAT,AATTTGTACATTTTTTATGCACGTTGAATGA,CAATTTGTACATTTTTTATGCACGTTGAATG,ACAATTTGTACATTTTTTATGCACGTTGAAT,TACAATTTGTACATTTTTTATGCACGTTGAA,ATACAATTTGTACATTTTTTATGCACGTTGA,AATACAATTTGTACATTTTTTATGCACGTTG PV:Z:3.955763e-08,3.955763e-08,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07,2.794231e-07
It seems that the PV has been added correctly. I will try gwask as well as to compile katcher and will let you know asap. Davide
Hi Davide,
It looks like katcher
and app_pvalues
worked fine indeed. Keep me posted regarding downstream analyses!
Marc-André
Hi Marc-André, I tried katcher and add_pvalue and they work fine, thank you. However when I try to run
bedtools bamtobed
on the output of add_pvalue, it does not recongnize the value in the PV tag and rounds them all to 0. I know this is probably due to beddtools, however I wondered how I can generate a datframe to plot a Manhattan plot for the associated kmers. DavideOriginally posted by @DavideGue in https://github.com/malemay/katcher/issues/1#issuecomment-1991313447