grenaud / glactools

command-line tools for the management of genotype likelihoods and allele counts
http://grenaud.github.io/glactools/
GNU General Public License v3.0
28 stars 2 forks source link

Sanity Check #2

Closed mike2vandy closed 6 years ago

mike2vandy commented 6 years ago

So, I just download glactools and I'm working through the pipeline to convert bam to plink (bam2acf, union, acf2bplink).

At union I'm getting this error. Why would this error be thrown?

glactools union ERR484729.acf ERR490277.acf > test2.acf

sanityCheck()1 Chromosomes differ between FLZR01000063.1 740 A,N 0,0:0 0,0:0 1,0:0 and FLZR01000063.1 740 A,N 0,0:0 0,0:0 1,0:0

Also, I know it's a FAQ, but I still don't understand the difference between a defined site (intersect) and undefined site (union), because I can get intersect to work without issues.

Thanks, Mike

grenaud commented 6 years ago

There was a problem whereby only the coordinate was used to distinguish if we had skipped a line. This was fine as long as the coordinate wasn't the same when switching over chromosomes. I have since included the index of the chromosome. Thank you for your report!

Intersect requires all sites to be defined. Union does not. Say file 1 has data for the following coordinates:

103 104 105

and for the same chromosome, file 2 has:

103 104 106

intersection will contain:

103 104

union will contain:

103 104 105 106

However, for the site at position 105, since it is only defined in file#2, the allele count column for file#1 will just be a bunch of 0,0:0. This is done to account for missing data.

mike2vandy commented 6 years ago

I re-downloaded and reran union without a problem. Thanks for the help.

grenaud commented 6 years ago

no, thank you for reporting this.