medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
140 stars 19 forks source link

Input file format #3

Closed tommathers closed 5 years ago

tommathers commented 5 years ago

Hi,

Can you confirm how the inout genomes should be formatted? Does each genome need to be in the same fasta file? If all genomes are in the same file how are genomes multiple contigs dealt with? Will the program identify alignments within each input genome as well as between genomes?

Thanks,

Tom.

iminkin commented 5 years ago

Yes, all sequences should be in the same file and should have unique IDs: alignments are reported with respect to the IDs in FASTA files. SibeliaZ is oblivious to the fact that sequences may come from different genomes and will try to find alignments within each sequence as well as between sequences. So far a contig is treated like a chromosome.

Thanks for your questions, I will update the README to reflect these features, it is quite important.

tommathers commented 5 years ago

Thanks for the explanation. To generate an output similar to cactus (after conversion to hal) would I need to parse the MAF file to remove self alignments? For example, by adding a species ID to fasta headers?

Cheers,

Tom.

iminkin commented 5 years ago

Yes, you will have to do it manually.

tommathers commented 5 years ago

Hi,

Sorry if this is a daft question, but I am struggling to make the sibeliaz maf file compatible with downstream tools. msa_view from phast and also the hal tools expect the maf file to be organised relative to a reference species (set by the first id in the alignment with the format "species.chromosome"). I think this is a standard convention for maf?

Is it possible to make sibeliaz output alignments relative to a reference species or failing that, do you know any tools that can read in the sibelia output and set a reference species?

I am testing on 3 species with ~400 mb genomes, with one species assembled into chromosomes and the others with thousands of contigs.

Thanks,

Tom.

iminkin commented 5 years ago

@tommathers,

Can you show an example of a MAF alignment organised relative to a reference so I can adjust output of SibeliaZ?