rlorigro / GFAse

Tool for globally phasing diploid assembly graphs with orthogonal data
Mozilla Public License 2.0
36 stars 4 forks source link

How to generate kmer .fa file #20

Closed scthree closed 1 year ago

scthree commented 1 year ago

Dear Developers,

Thank you for this wonderful tool and your contributions to the community!

Forgive me if I'm asking a stupid question (I tried to google search an answer but did not find anything definitive). I would like to use GFAse to try to phase an assembly using parental kmers. I already have paternal/maternal kmer files generated with yak (.yak) and meryl (.meryl). How do I convert these files into fasta format for input into GFAse?

I know you're busy and appreciate your time--

Sincerely, Steve

rlorigro commented 1 year ago

Hi Steve,

No worries, @meredith705 should be able to help with that

meredith705 commented 1 year ago

Hello Steve,

I have used KMC3 to generate these kmer.fasta files. KMC3 has a straightforward set of tools to make kmers and isolate unique kmers from each parent.

The commands we use to get these kmers per parent are in the kmc.wdl. We then use the kmc_tools dump method and an awk command to create parental fasta files. Here is a gfase_trio_phase.wdl that takes in a gfa and parental fastq/fasta read files then makes the kmer.fasta files and runs gfase. If you can run a WDL this script automates the pipeline for you.

There might be a way to turn your meryl database kmers into a fasta file ( with the print or display command ) and then some version of text wrangling to turn the output into a fasta format for gfase. However, I have not tried this.

Thanks for using our tool, Melissa M

scthree commented 1 year ago

Thank you @rlorigro!

Hi Melissa, thank you greatly for your wonderful, detailed, and easy to understand explanation. I followed your instructions and was able to generate the kmers with KMC3.

I am having some difficulty installing GFAse (missing Jansson package) so re-opened an old thread...sorry for the additional trouble!

Sincerely, Steve