Closed sckinta closed 4 years ago
Dear Chun, many thanks for this detailed bug report!! I suspect that the issue lies in the fact that you are missing a -h flag in your cmd-in, i.e. it should be "samtools view -h --no-PG". If it's indeed the problem, pairtools is partially to blam - we should tell the user when the header is missing. I'll introduce a short error message. Again, many thanks for reaching out! Anton.
Hi Anton,
Thank you for quick reply! I tried your suggestion pairtools parse -c chrom_hg19.sizes --drop-sam --cmd-in "samtools view -h --no-PG" dedup.bam
. It reports new error this time.
Traceback (most recent call last):
File "/home/suc1/.conda/envs/py36/bin/pairtools", line 11, in <module>
sys.exit(cli())
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/__init__.py", line 113, in wrapper
return func(*args, **kwargs)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/pairtools_parse.py", line 156, in parse
output_parsed_alignments, output_stats, **kwargs)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/pairtools_parse.py", line 213, in parse_py
header = _headerops.append_new_pg(header, ID=UTIL_NAME, PN=UTIL_NAME)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/_headerops.py", line 184, in append_new_pg
new_samheader = _add_pg_to_samheader(samheader, ID, PN, VN, CL, force)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/_headerops.py", line 246, in _add_pg_to_samheader
pg_chains = _parse_pg_chains(samheader, force=force)
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/_headerops.py", line 281, in _parse_pg_chains
for l in header
File "/home/suc1/.conda/envs/py36/lib/python3.6/site-packages/pairtools/_headerops.py", line 282, in <listcomp>
if l.startswith('@PG')
ValueError: dictionary update sequence element #0 has length 1; 2 is required
It looks like the problems lies at@PG
handling to me. However, if I do not use --no-PG
, I have issue withsamtools view
instead.
$ samtools view merged.dedup.bam
[E::sam_hrecs_error] Malformed key:value pair at line 32: "@PG HiCUP Deduplicator VN:0.5.9"
samtools view: failed to add PG line to the header
This has to do with samtools version. When I installed pairtools in a conda environment, samtools (htslib 1.11) was installed as accessory (I think). I tested that samtools (htslib 1.7) works fine with my bam file without --no-PG
. The error comes from samtools (htslib 1.11) only.
I am wondering whether it is necessarily to keep PG group in pairtools parse
. Does pairtools
have to compare with samtools version 1.11. Is it possible to work around it?
Thank you!!!
ah, okay, so the issue is that the HiCUP Deduplicator produces .sam files that do not adhere to the SAM format.
According to the official documentation, "In the header, each line is TAB-delimited and, apart from @CO lines, each data field follows a format ‘TAG:VALUE’ where TAG is a two-character string that defines the format and content of VALUE". The @PG
line that you showed doesn't follow this recommendation. :(
I'll commit a fix in a few minutes!
Aha, Thank you! I just found some bam files I generated using newer version of HiCUP. They fixed "TAG:VALUE" for @PG
. I just tested pairtools
on those new bam files, it works well. Thank you!
oh, nice!! i did introduce the fix too, just in case!
I am trying to use
pairtools parse
to convert a bam file generated by hicup to pairs.It kept reporting error
I double checked the
@SQ
header in bam file and chrom.size file chromosome. They are matched and show the same order.chrom_hg19.sizes is attached here
pairtools parse
worked appropriately with MATalpha test data and its corresponding chrom.sizeCould you help solve this problem? Thank you!!