lauringlab / variant_pipeline

Work on the variant_pipeline and initial r analysis used in calling variants from NGS data
Apache License 2.0
8 stars 13 forks source link

reciprocal_variants.py throws error #17

Open grendon opened 5 years ago

grendon commented 5 years ago

Hi, I had a batch of 15 samples to analyze and I was able to obtain results for the majority of them. However, there were four samples that caused the reciprocal_variants.py script to fail with this error. See below.

Traceback (most recent call last): File "src/variant_pipeline/scripts/reciprocal_variants.py", line 130, in <module> var = get_reference(row,bam,"var") File "src/variant_pipeline/scripts/reciprocal_variants.py", line 94, in get_reference for pileupcolumn in bam.pileup(df.chr,py_pos,py_pos+1,truncate=True,stepper="all",max_depth=1E6): # look at the range py_pos to py_pos+1 and all the bases that align to those positions. Think of a column on each position like the samtools output of mpileup File "pysam/libcalignmentfile.pyx", line 1051, in pysam.libcalignmentfile.AlignmentFile.pileup (pysam/libcalignmentfile.c:12611) File "pysam/libcalignmentfile.pyx", line 774, in pysam.libcalignmentfile.AlignmentFile.parse_region (pysam/libcalignmentfile.c:10572) File "pysam/libcalignmentfile.pyx", line 1631, in pysam.libcalignmentfile.AlignmentFile.gettid (pysam/libcalignmentfile.c:18839) File "pysam/libcalignmentfile.pyx", line 689, in pysam.libcalignmentfile.AlignmentFile.get_tid (pysam/libcalignmentfile.c:9515) File "pysam/libcutils.pyx", line 122, in pysam.libcutils.force_bytes (pysam/libcutils.c:3081) TypeError: Argument must be string, bytes or unicode.

Have you seen this error before? What is the cause? how can it be corrected?

alauring commented 5 years ago

Don’t know. Have forwarded on to Andrew Valesano and JT McCrone who might have suggestions.

On May 8, 2019, at 12:41 PM, grendon notifications@github.com wrote:

Hi, I had a batch of 15 samples to analyze and I was able to obtain results for the majority of them. However, there were four samples that caused the reciprocal_variants.py script to fail with this error. See below.

Traceback (most recent call last): File "src/variant_pipeline/scripts/reciprocal_variants.py", line 130, in var = get_reference(row,bam,"var") File "src/variant_pipeline/scripts/reciprocal_variants.py", line 94, in get_reference for pileupcolumn in bam.pileup(df.chr,py_pos,py_pos+1,truncate=True,stepper="all",max_depth=1E6): # look at the range py_pos to py_pos+1 and all the bases that align to those positions. Think of a column on each position like the samtools output of mpileup File "pysam/libcalignmentfile.pyx", line 1051, in pysam.libcalignmentfile.AlignmentFile.pileup (pysam/libcalignmentfile.c:12611) File "pysam/libcalignmentfile.pyx", line 774, in pysam.libcalignmentfile.AlignmentFile.parse_region (pysam/libcalignmentfile.c:10572) File "pysam/libcalignmentfile.pyx", line 1631, in pysam.libcalignmentfile.AlignmentFile.gettid (pysam/libcalignmentfile.c:18839) File "pysam/libcalignmentfile.pyx", line 689, in pysam.libcalignmentfile.AlignmentFile.get_tid (pysam/libcalignmentfile.c:9515) File "pysam/libcutils.pyx", line 122, in pysam.libcutils.force_bytes (pysam/libcutils.c:3081) TypeError: Argument must be string, bytes or unicode.

Have you seen this error before? What is the cause? how can it be corrected?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lauringlab/variant_pipeline/issues/17, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQL4HD63OX6M2MJVY7ULT3PUL7CFANCNFSM4HLTLSAQ.

alauring commented 5 years ago

From JT….

My best bet would be that there is something off with their segment name. It looks like the error is coming from the pysam function bam.pileup(df.chr,py_pos,py_pos+1,truncate=True,stepper="all",max_depth=1E6) and it’s complaining one of the "Argument must be string, bytes or unicode.” By the looks of it “df.chr” is only one of those that should be string. Maybe it is a number here. That’s where I would start.

On May 8, 2019, at 12:41 PM, grendon notifications@github.com wrote:

Hi, I had a batch of 15 samples to analyze and I was able to obtain results for the majority of them. However, there were four samples that caused the reciprocal_variants.py script to fail with this error. See below.

Traceback (most recent call last): File "src/variant_pipeline/scripts/reciprocal_variants.py", line 130, in var = get_reference(row,bam,"var") File "src/variant_pipeline/scripts/reciprocal_variants.py", line 94, in get_reference for pileupcolumn in bam.pileup(df.chr,py_pos,py_pos+1,truncate=True,stepper="all",max_depth=1E6): # look at the range py_pos to py_pos+1 and all the bases that align to those positions. Think of a column on each position like the samtools output of mpileup File "pysam/libcalignmentfile.pyx", line 1051, in pysam.libcalignmentfile.AlignmentFile.pileup (pysam/libcalignmentfile.c:12611) File "pysam/libcalignmentfile.pyx", line 774, in pysam.libcalignmentfile.AlignmentFile.parse_region (pysam/libcalignmentfile.c:10572) File "pysam/libcalignmentfile.pyx", line 1631, in pysam.libcalignmentfile.AlignmentFile.gettid (pysam/libcalignmentfile.c:18839) File "pysam/libcalignmentfile.pyx", line 689, in pysam.libcalignmentfile.AlignmentFile.get_tid (pysam/libcalignmentfile.c:9515) File "pysam/libcutils.pyx", line 122, in pysam.libcutils.force_bytes (pysam/libcutils.c:3081) TypeError: Argument must be string, bytes or unicode.

Have you seen this error before? What is the cause? how can it be corrected?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lauringlab/variant_pipeline/issues/17, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQL4HD63OX6M2MJVY7ULT3PUL7CFANCNFSM4HLTLSAQ.

grendon commented 5 years ago

ok. thanks. I'll check it out!

grendon commented 5 years ago

None of my input files have a column called df.chr The script worked with some samples but not with others.

BTW These are the columns of my input files:

chr pos ref var p.val freq.var sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw cov.ctrl.fw n.ctrl.bw cov.ctrl.bw raw.p.val Id mutation

alauring commented 5 years ago

Hi Gloria,

Perhaps it is chr and it has been renamed df.chr or there is something that is altered in one script and not another? I am sorry that I cannot be helpful here. We haven’t had this particular problem and you’ve made a lot of changes on your end.

Are reciprocal variants something that would be expected in your analysis? These are when the consensus sequence differs from the plasmid control (a fixed mutation).

On May 8, 2019, at 5:45 PM, grendon notifications@github.com wrote:

None of my input files have a column called df.chr The script worked with some samples but not with others.

BTW These are the columns of my input files:

chr pos ref var p.val freq.var sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw cov.ctrl.fw n.ctrl.bw cov.ctrl.bw raw.p.val Id mutation

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lauringlab/variant_pipeline/issues/17#issuecomment-490662464, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQL4HC52PGMJYWCU4WFTWTPUNCXNANCNFSM4HLTLSAQ.

grendon commented 5 years ago

Could it be something simple like different versions of python and/or biopython running under the hood?

I am using python version 2.7.13 and biopython version 1.68

Do you know which versions of python and biopython were used on your end?