mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
41 stars 15 forks source link

IndexError: cannot do a non-empty take from an empty axes. #12

Closed nh13 closed 6 years ago

nh13 commented 6 years ago

I get the following error when running adVNTR. After, I modified the code printed out false_scores in vntr_finder.py on line 197 and I get[]. Any help would be greatly appreciated.

Traceback (most recent call last):
  File "adVNTR/advntr.py", line 58, in <module>
    run_advntr()
  File "adVNTR/advntr.py", line 49, in run_advntr
    genotype(args, genotype_parser)
  File "...adVNTR/src/commands.py", line 73, in genotype
    genome_analyzier.find_repeat_counts_from_alignment_file(input_file)
  File "...adVNTR/src/genome_analyzer.py", line 74, in find_repeat_counts_from_alignment_file
    copy_number = self.vntr_finder[vid].find_repeat_count_from_alignment_file(alignment_file, unmapped_reads)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 643, in find_repeat_count_from_alignment_file
    selected_reads = self.select_illumina_reads(alignment_file, unmapped_filtered_reads)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 575, in select_illumina_reads
    min_score_to_count_read = self.get_min_score_to_select_a_read(hmm, alignment_file, read_length)
  File "...adVNTR/src/vntr_finder.py", line 219, in get_min_score_to_select_a_read
    score = self.calculate_min_score_to_select_a_read(hmm, alignment_file)
  File "...adVNTR/src/profiler.py", line 8, in wrapper
    retval = func(*args, **kwargs)
  File "...adVNTR/src/vntr_finder.py", line 199, in calculate_min_score_to_select_a_read
    score = numpy.percentile(false_scores, 100 - settings.SCORE_SELECTION_PERCENTILE)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile
    interpolation=interpolation)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce
    r = func(a, **kwargs)
  File ".../lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File ".../lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File ".../lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
mehrdadbakhtiari commented 6 years ago

Hi Nils, It seems that you are trying to run the tool on a new VNTR which is not on trained VNTR list, so the tool tries to train the model which is not documented well and has some unnecessary assumptions about the input. Commit 026b32a95aec2c32e6e33083832732ab2bcb061f will temporary solve it.

nh13 commented 6 years ago

@mehrdadbakhtiari I did specify a VNTR id that's present in the VNTR database. I am guessing very few of them have a trained model?

mehrdadbakhtiari commented 6 years ago

Yes that correct. As described in page 5 of preprint (https://doi.org/10.1101/221754), training of the model requires aligning of large number of reads (~10^7) to the HMM to find empirical null distribution. This is generally a slow but one time process for each locus. I periodically update the distributions and add more models to the project after doing more tests on each locus. In addition, I'm working on documenting this process, so you would be able to do it. Meanwhile, I made it possible to do approximate recruitment that doesn't require the distribution. So now you should be able to do the recruitment without completely training the model though the result would improve after the training.

Best, Mehrdad