shohei-kojima / MEGAnE

MEGAnE
MIT License
24 stars 3 forks source link

IndexError: list index out of range of `1_indiv_call_genotype.py` #4

Closed YiweiNiu closed 2 years ago

YiweiNiu commented 2 years ago

Hi,

Sorry to disturb you again.

I am using the latest version of MEGAnE from Github and applied it to a human WGS sample from the 1KGP. Below is the command I used.

# MEGAnE_step1
python 1_indiv_call_genotype.py \
  -i $bam \
  -sample_name $sample \
  -outdir $outdir \
  -fa $REFERENCE \
  -mk $megane_mk/Homo_sapiens_assembly38.fasta.mk \
  -fadb $megane_lib/hg38_blastdb \
  -rep $megane_lib/Dfam_custom.rep \
  -repout $megane_lib/Homo_sapiens_assembly38.fasta.out \
  -repremove $megane_lib/non_ME_rep.txt \
  -pA_ME $megane_lib/ME_with_pA.txt \
  -mainchr $megane_lib/main_chrs.txt \
  -p 2

I got the following error message:

2022-05-03 13:26:46,347:INFO:Initial check started.
2022-05-03 13:27:01,794:INFO:estimated read lenth = 150
2022-05-03 13:27:02,166:INFO:All 24 main chromosome(s) were found in /home2/niuyw/project/MEI2/gatkout/HG00315/HG00315.final.bam.
2022-05-03 13:27:02,166:INFO:"chrX" was found in /home2/niuyw/project/MEI2/gatkout/HG00315/HG00315.final.bam. "chrX" will be considered as a female sex chromosome.
2022-05-03 13:27:02,167:INFO:"chrY" was found in /home2/niuyw/project/MEI2/gatkout/HG00315/HG00315.final.bam. "chrY" will be considered as a male sex chromosome.

2022-05-03 13:27:02,286:INFO:Preprocessing started.
2022-05-03 13:27:43,468:INFO:N=1155 repeats found in /home2/niuyw/RefData/MEI/megane_repeat_lib/Dfam_custom.rep. N=1155 will be analyzed. N=0 will be excluded due to non-ME repeats.
2022-05-03 13:32:46,133:INFO:Discordant read search started.
2022-05-03 14:06:41,396:INFO:Screening results:nonXY_reads=671052315,X_reads=34701931,Y_reads=932743,chimeric_reads=14570132,hybrid_reads=7500124,pA_reads=52820,absent_reads=113834
2022-05-03 14:06:41,402:INFO:estimated autosome depth = 34
2022-05-03 14:06:41,402:INFO:estimated sex = female
2022-05-03 14:06:41,569:INFO:Clipped read processing started.
2022-05-03 15:20:28,868:INFO:Unmapped read processing started.
2022-05-03 16:12:07,084:INFO:Hybrid read processing started.
2022-05-03 16:24:31,889:INFO:Integration junction search (outside of TEs) started.
2022-05-03 16:25:57,097:INFO:Integration junction search (nested in TEs) started.
2022-05-03 16:44:39,723:INFO:Filtering started.
2022-05-03 16:47:42,243:INFO:[1518, 1479] ME insertion candidates found.
2022-05-03 16:47:42,244:INFO:ME insertion search finished!

2022-05-03 16:47:42,490:INFO:Absent ME search started.
2022-05-03 16:47:52,141:ERROR:
Traceback (most recent call last):
  File "/Parastor300s_G30S/niuyw/software/MEGAnE/scripts/find_absent.py", line 62, in find_abs
    d[id].append([int(ls[1]), int(ls[2]), ls[3], ls[8], ls[4], ls[9]])
IndexError: list index out of range

Here is the detailed log for_debug.log.

Thanks in advance.

shohei-kojima commented 2 years ago

Thank you for reporting this. I reproduced the same error when below are used.

Judging from the error, pybedtools 0.9.0 is highly likely causing this. If possible, please try it again using Python 3.7.X and pybedtools 0.8.0. I will improve MEGAnE to be able to use pybedtools 0.9.0, but it would take several days.

In the case of HPC, singularity 2 may be easier to be installed, so please also consider using it. In addition to singularity 3 .sif container, singularity 2 .simg container can be built from the docker hub.

shohei-kojima commented 2 years ago

Sorry for many massages. Downgrading bedtools from v2.30.0 to v2.28.0 simply solved this. Probably this is the best solution for now. Bedtools I tried: https://github.com/arq5x/bedtools2/releases/download/v2.28.0/bedtools

YiweiNiu commented 2 years ago

Hi,

Thank you very much for your quick reply.

I downgraded the bedtools from v2.30.0 to v2.28.0 and it works now.

Thank you for developing this useful tool.

kalon33 commented 1 week ago

@shohei-kojima Hi, has it been fixed? It seems I just hit the same problem, using default dependencies versions available in Debian Bookworm:

2024-09-18 08:02:58,107:INFO:Initial check started.
2024-09-18 08:02:58,516:INFO:estimated read lenth = 151
2024-09-18 08:02:58,577:INFO:All 24 main chromosome(s) were found in test/test_trio/trio_0/trio_0.cram.
2024-09-18 08:02:58,577:INFO:"X" was found in test/test_trio/trio_0/trio_0.cram. "X" will be considered as a female sex chromosome.
2024-09-18 08:02:58,577:INFO:"Y" was found in test/test_trio/trio_0/trio_0.cram. "Y" will be considered as a male sex chromosome.

2024-09-18 08:02:58,584:INFO:Preprocessing started.
2024-09-18 08:03:05,685:INFO:N=1170 repeats found in ./Dfam_3.7_custom.rep. N=1170 will be analyzed. N=0 will be excluded due to non-ME repeats.
2024-09-18 08:04:38,224:INFO:Discordant read search started.
2024-09-18 08:09:58,450:INFO:Screening results:nonXY_reads=757117852,X_reads=41390598,Y_reads=1449312,chimeric_reads=53589174,hybrid_reads=4385944,pA_reads=191907,absent_reads=274115
2024-09-18 08:09:58,451:INFO:estimated autosome depth = 40
2024-09-18 08:09:58,451:INFO:estimated sex = female
2024-09-18 08:09:58,612:INFO:Clipped read processing started.
2024-09-18 08:50:24,965:INFO:Unmapped read processing started.
2024-09-18 09:46:34,500:INFO:Hybrid read processing started.
2024-09-20 20:31:30,549:INFO:Integration junction search (outside of TEs) started.
2024-09-20 20:32:10,707:INFO:Integration junction search (nested in TEs) started.
2024-09-20 20:36:34,384:INFO:Filtering started.
2024-09-20 20:37:13,154:INFO:[1504, 1498] ME insertion candidates found.
2024-09-20 20:37:13,154:INFO:ME insertion search finished!

2024-09-20 20:37:13,156:INFO:Absent ME search started.
2024-09-20 20:37:15,836:ERROR:
Traceback (most recent call last):
  File "/usr/bin/scripts/find_absent.py", line 62, in find_abs
    d[id].append([int(ls[1]), int(ls[2]), ls[3], ls[8], ls[4], ls[9]])
                                                               ~~^^^
IndexError: list index out of range

In our academic facility we really prefer using shared dependencies and build Debian packages from the software we run, hence the constraints on the versions we can use.

Thanks for your help fixing this.