stschiff / sequenceTools

Other
39 stars 10 forks source link

SeqFormatException #15

Open KovachLab opened 3 years ago

KovachLab commented 3 years ago

Hi,

I'm attempting to run sequenceTools to generate pseudo-haploid calls for a list of bamfiles.

However I'm getting the following error: pileupCaller: SeqFormatException "cannot parse chromosome "

Any suggestions? Thanks!

stschiff commented 3 years ago

Chromosomes are required to be numbered, or be named "MT", "Y" or "X", and they can start with "chr" which gets ignored. This is required because I impose a natural order on chromosomes, which is 1-22,X,Y,MT, so

What are your chromosome names?

KovachLab commented 3 years ago

The chromosome names in the fasta reference are as follows (there are about 8000 scaffolds so I only included a few, their names are all of a similar format) :

LG01 LG02 LG03 LG04 LG05 LG06 LG07 LG08 LG09 LG10 LG11 LG12 LG13 LG14 LG15 LG16 LG17 LG18 LG19 LG20 LG21 LG22 LG23 MT_genome GmG20150304_scaffold_1408 GmG20150304_scaffold_1409 GmG20150304_scaffold_1410 . . .

stschiff commented 3 years ago

OK, this is a problem! I understand this would be great to have, but currently I don't allow this. I strictly operate on a simple chromosomal scale as for humans. The reason is that pileupCaller needs to match up genomic positions (consisting of chromosome name and position in the chromosome) between the incoming pileup data and the provided SNP file. In order for this matching to work faithfully, I work with a strictly ordered list of chromosomes.

I need to think how to best fix this. Perhaps I allow the user to input a custom chromosome/scaffold order, or I simply assume that the chromosomal order is the same in the pileup data and the SNP file.

Sorry, this will take a bit time, and I'm sorry that pileupCaller currently doesn't support scaffolded genomes. Clearly a shortcoming, and I'll work on that.