open2c / pairtools

Extract 3D contacts (.pairs) from sequencing alignments
MIT License
104 stars 32 forks source link

IndexError: pop index out of range for pairtools split #159

Closed KunFang93 closed 2 years ago

KunFang93 commented 2 years ago

Hi,

Thanks for providing this useful tools! I encountered an error when I try to use pairtools split.

The Error message:

Traceback (most recent call last):
  File "/data/kun/miniconda3/envs/dt-microc/bin/pairtools", line 11, in <module>
    sys.exit(cli())
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/pairtools/cli/split.py", line 44, in split
    split_py(pairsam_path, output_pairs, output_sam, **kwargs)
  File "/data/kun/miniconda3/envs/dt-microc/lib/python3.7/site-packages/pairtools/cli/split.py", line 119, in split_py
    sam2 = cols.pop(sam2col)
IndexError: pop index out of range

The commands I used:

bwa mem -5SP -T0 -t 40 /data/kun/Align_Index/grch38_no_alt/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta micro_3b_S25_R1_001.fastq.gz micro_3b_S25_R2_001.fastq.gz -o test.sam
pairtools parse2 --min-mapq 30 --report-position read --report-orientation read --add-pair-index --add-columns pos5,pos3 --max-inter-align-gap 30 --nproc-in 16 --nproc-out 16 --chroms-path /data/kun/Align_Index/grch38_no_alt/hg38.genome test.sam > test.pairsam
pairtools sort --tmpdir=./tmp --nproc 30 test.pairsam > test.srt.pairsam
pairtools dedup --nproc-in 16 --nproc-out 16 --mark-dups --output-stats test.txt --output test.dedup.pairsam test.srt.pairsam
pairtools split --nproc-in 8 --nproc-out 8 --output-pairs test.mapped.pairs --output-sam test.unsorted.bam test.dedup.pairsam

Those commands work well until pairtools split, but I suspected the error might be resulted from additional columns I added in pairtools parse2. I wondered how could I change my codes in case? Thanks in advance!

Best, Kun

agalitsyna commented 2 years ago

Hi, @KunFang93 Thanks for posting such a detailed report.

You might be using pairtools v1.0.0 or v1.0.1, where the deduplication procedure cuts sam columns for the reads with "#" signs in the read qualities. It was a known bug that we fixed in a recent release: https://github.com/open2c/pairtools/releases/tag/v1.0.2

Can you install recent pairtools (already available on pip or github), make sure it's 1.0.2, and let us know if the problem persists?

I also ran your example code on the mouse Micro-C sample with pairtools 1.0.2, and it worked perfectly.

KunFang93 commented 2 years ago

Thanks for your promptly reply! I will try v.1.0.2 and update the result later~

KunFang93 commented 2 years ago

V1.0.2 solve this issue.