thomasp85 / FindMyFriends

Fast alignment-free pangenome creation and exploration
27 stars 6 forks source link

Add export API #2

Open thomasp85 opened 9 years ago

thomasp85 commented 9 years ago

Possible formats:

Zbrel commented 8 years ago

About .fasta file, if I have some bacterial genomes in .seq, .fasta(only have sequence information), .gbk format, how could I transfer these files into .fasta in extdata (like >gi|71851486|gb|AE017243.1|_1 # 207 # 395 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGA/GGAG/GAGG;rbs_spacer=11-12bp;gc_cont=0.275 MQTNKNNLKVRTQQIRQQIENLLNDRMLYNNFFSTIYVLNETETEIIIDFTDLIAKQEVISR* ) to use FindMyFriends?

thomasp85 commented 8 years ago

I'm not quite sure if I understand your question. Do you have sequence information in multiple different formats? Generally you should try to have your sequences annotated by the same algorithm to avoid differences in gene detection bias...

Zbrel commented 8 years ago

I get it.Should use prodigal to predict protein-coding gene for prokaryotic genomes first.

thomasp85 commented 8 years ago

Yep - or glimmer, or something else... Currently only automatic location detection is supported for prodigal created files, but there is a fork where I'm working on a gff parser that should be more broadly applicable...

Zbrel commented 8 years ago

There are 260779 genes from 39 organisms, how long would consume to
run "mycoSim <- kmerSimilarity(mycoPan, lowerLimit=0.8, rescale=FALSE)" ?

thomasp85 commented 8 years ago

A lot of things factor in. First of, I don't know your computer hardware. The second thing is that kmerSimilarity is absolutely the least advised approach to calculating pangenomes in FindMyFriends as it is the most computationally heavy. If you have installed the development version (which I'll advice as it contains numerous improvements) then use the cdhitGrouping function followed by neighborhoodSplit. This way I've successfully calculated pangenomes from thousands of genomes within a day...

That some big genomes you're working with btw... ~6.500 genes

thomasp85 commented 8 years ago

As all this doesn't concern the issue of adding an export API I would prefer if you opened a new issue for further questions (which you are welcome to do - just trying to keep issues separated)