weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
139 stars 20 forks source link

boolean index did not match indexed array along dimension 0 #125

Open Wenfei-Xian opened 2 months ago

Wenfei-Xian commented 2 months ago

Hey, many thanks for this awesome tool !!!

I try to fine tune for Arabidopsis thaliana with more RNA seq data. Below are the commands I used, but I got the error when I ran filter-to-most-certain.py

https://raw.githubusercontent.com/weberlab-hhu/helixer_scratch/master/data_scripts/filter-to-most-certain.py https://raw.githubusercontent.com/weberlab-hhu/helixer_scratch/master/data_scripts/n90_train_val_split.py

commands:

singularity exec -B /tmp/global2/wxian/software/Helixer_fine_tuning:/tmp/global2/wxian/software/Helixer_fine_tuning --no-home ../helixer-docker_helixer_v0.3.2_cuda_11.8.0-cudnn8.sif fasta2h5.py --subsequence-length 213840 --species Arabidopsis_thaliana --h5-output-path Col-CC.v2.fa.h5 --fasta-path Col-CC.v2.fa

singularity exec --nv -B /tmp/global2/wxian/software/Helixer_fine_tuning:/tmp/global2/wxian/software/Helixer_fine_tuning --no-home ../helixer-docker_helixer_v0.3.2_cuda_11.8.0-cudnn8.sif HybridModel.py --load-model-path /tmp/global2/wxian/software/Helixer/.local/share/Helixer/models/land_plant/land_plant_v0.3_a_0080.h5 --test-data Col-CC.v2.fa.h5 --prediction-output-path Col-CC.v2.fa_predictions.h5 --overlap --overlap-offset 106920  --batch-size 9 --val-test-batch-size 9 -v

singularity exec --nv -B /tmp/global2/wxian/software/Helixer_fine_tuning:/tmp/global2/wxian/software/Helixer_fine_tuning --no-home ../helixer-docker_helixer_v0.3.2_cuda_11.8.0-cudnn8.sif helixer_post_bin Col-CC.v2.fa.h5 Col-CC.v2.fa_predictions.h5 100 0.1 0.8 60 Col-CC.v2.fa.helixer.gff3

singularity exec -B /tmp/global2/wxian/software/Helixer_fine_tuning:/tmp/global2/wxian/software/Helixer_fine_tuning --no-home ../helixer-docker_helixer_v0.3.2_cuda_11.8.0-cudnn8.sif import2geenuff.py --fasta Col-CC.v2.fa --gff3 Col-CC.v2.fa.helixer.gff3 --db-path Col-CC.v2.sqlite3 --log-file Col-CC.v2.log --species Arabidopsis_thaliana

singularity exec -B /tmp/global2/wxian/software/Helixer_fine_tuning:/tmp/global2/wxian/software/Helixer_fine_tuning --no-home ../helixer-docker_helixer_v0.3.2_cuda_11.8.0-cudnn8.sif geenuff2h5.py --h5-output-path Col-CC.v2.predictions_training.h5 --input-db-path Col-CC.v2.sqlite3 --subsequence-length 213840

cp Col-CC.v2.predictions_training.h5 Col-CC.v2.predictions_training.backup.h5

python3 ../Helixer/helixer/evaluation/add_ngs_coverage.py -s Arabidopsis_thaliana --unstranded --bam RNA_seq_stress/SRX1882551.sorted.bam --h5-data Col-CC.v2.predictions_training.h5 --dataset-prefix rnaseq --threads 128

python3 filter-to-most-certain.py --write-by 6415200 --h5-to-filter Col-CC.v2.predictions_training.h5 --predictions Col-CC.v2.fa_predictions.h5 --keep-fraction 0.2 --output-file Col-CC.v2.fa.filtered.h5

Error message of filter-to-most-certain.py

selecting 265 with average normalized distances below in each genic proportion ranking [0.032440142162364384, 0.034895248784137675, 0.03238402543958099, 0.03226711560044893, 0.015221076505798728]
INFO: the following arrays will be copied in their entirety and not be subset,
these are expected to relate to metadata:
 ['evaluation/rnaseq_meta/bam_files']
Traceback (most recent call last):
  File "/tmp/global2/wxian/software/Helixer_fine_tuning/filter-to-most-certain.py", line 116, in <module>
    main(args)
  File "/tmp/global2/wxian/software/Helixer_fine_tuning/filter-to-most-certain.py", line 101, in main
    copy_groups_recursively(h5_in, h5_out, skip_arrays=skip_groups, start_i=si, end_i=si + max_n_chunks,
  File "/tmp/global2/wxian/software/Helixer_fine_tuning/n90_train_val_split.py", line 121, in copy_groups_recursively
    h5_in.visititems(maybe_copy_some_data)
  File "/tmp/global2/wxian/conda/envs/htseq/lib/python3.10/site-packages/h5py/_hl/group.py", line 668, in visititems
    return h5o.visit(self.id, proxy)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 355, in h5py.h5o.visit
  File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
  File "/tmp/global2/wxian/conda/envs/htseq/lib/python3.10/site-packages/h5py/_hl/group.py", line 667, in proxy
    return func(name, self[name])
  File "/tmp/global2/wxian/software/Helixer_fine_tuning/n90_train_val_split.py", line 119, in maybe_copy_some_data
    copy_some_data(h5_in, h5_out, name, mask, start_i, end_i)
  File "/tmp/global2/wxian/software/Helixer_fine_tuning/n90_train_val_split.py", line 105, in copy_some_data
    keep_idxs = keep_idxs[mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 30