medvedevgroup / TwoPaCo

A fast constructor of the compressed de Bruijn graph from many genomes
Other
39 stars 10 forks source link

Large k value #4

Closed Malfoy closed 7 years ago

Malfoy commented 7 years ago

Hello ! I would be very interested to use TwoPaCo with large kmers. It works with 281 but not with 291 on Ecoli reference genome. ./twopaco -f 30 -k 291 ../../../../data/ecoli.fa Give a segfault.

Would it be possible for TwoPaCo to works on arbitray size of k ?

iminkin commented 7 years ago

Yes, it is possible. I updated the doc: https://github.com/medvedevgroup/TwoPaCo#k-mer-size

iminkin commented 7 years ago

In the next release I will make it a parameter for running CMake such that the user will be allowed to specify maximum K directly without editing the code.

Malfoy commented 7 years ago

That would be the perfect solution.

Also do you think it would be possible for graphdump to have a regular fasta output option ? It would be really convenient.

Excellent piece of work by the way !

iminkin commented 7 years ago

Thank you. You mean FASTA file with compressed paths? If so, it is not hard to do, I will add in the next release.

Malfoy commented 7 years ago

Yes it would be great. Thank you for your answers

iminkin commented 7 years ago

Added in 0.9.2.

Malfoy commented 7 years ago

I just tested it

graphdump -f fasta test.bin -s seq51.fa PARSE ERROR: Argument: -f (--format) Value 'fasta' does not meet constraint: seq|group|dot|gfa1|gfa2

iminkin commented 7 years ago

That is interesting. Are you sure you pulled the latest revision? What is the output of --help, does fasta appear in the formats list?