kr-colab / diploSHIC

feature-based deep learning for the identification of selective sweeps
MIT License
50 stars 14 forks source link

Program halt when samples were repeatedly grouped #4

Closed oushujun closed 6 years ago

oushujun commented 6 years ago

Hello,

I was able to get diploSHIC.py fvecVcf diploid running but I notice one thing that the program will halt when some samples were put in multiple groups using the --sampleToPopFileName tag. Multiple grouping of samples is sometimes necessary depending on the research question. For example, the human popgen can be either focus on English or European and samples collected from England belong to both of these groups.

The error message is:

Traceback (most recent call last): File "/opt/software/SHIC/diploSHIC/makeFeatureVecsForChrArmFromVcfDiploid.py", line 85, in genos = allel.GenotypeArray(rawgenos).subset(sel1=sampleIndicesToKeep) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/lib/python3.6/site-packages/allel/model/ndarray.py", line 1517, in subset return subset_genotype_array(self, sel0, sel1, cls=type(self), subset=subset) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/lib/python3.6/site-packages/allel/model/generic.py", line 230, in subset_genotype_array out = subset(g.values, sel0, sel1, **kwargs) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/lib/python3.6/site-packages/allel/model/ndarray.py", line 66, in subset return data[sel0, sel1] IndexError: arrays used as indices must be of integer (or boolean) type

I can bypass this error by making multiple grouping files, so that samples in each of these files belong to unique groups. It woule be helpful if the program can tolerate multiple grouping (or just emit a warning when this happends).

Thank you!

Shujun

andrewkern commented 6 years ago

Samples are not expected to belong to multiple groups. diploSHIC is only set to run at one population at a time.