Closed gevro closed 3 years ago
Hi, I'm getting this error from this command. How do I fix this? Note: this seems to be the same as #10, but my command was correct, so that cannot be the explanation. Thanks!
Docker: yanmei/mosaicforecast:0.0.1
python ReadLevel_Features_extraction.py input.bed sample.features bam_dir input/Homo_sapiens_assembly38.fasta input/k24.umap.sorted.bw 2 bam
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set loc = self.items.get_loc(item) File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "ReadLevel_Features_extraction.py", line 984, in <module> df['querypos_p']=df.apply(lambda row: my_wilcox_pvalue(row['querypos_major'], row['querypos_minor']), axis=1) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__ self._set_item(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item NDFrame._set_item(self, key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item self._data.set(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set self.insert(len(self.items), item, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block return klass(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in __init__ super().__init__(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in __init__ f"Wrong number of items passed {len(self.values)}, " ValueError: Wrong number of items passed 38, placement implies 1
First few lines of tmp file output:
id querypos_major querypos_minor leftpos_major leftpos_minor seqpos_major seqpos_minor mapq_major mapq_minor baseq_major baseq_minor baseq_major_near1b baseq_minor_near1b major_plus major_minus minor_plus minor_minus context1 context2 context1_count context2_count mismatches_major mismatches_minor major_read1 major_read2 minor_read1 minor_read2 dp_near dp_far conflict_num mappability type length GCcontent ref_softclip alt_softclip indel_proportion_SNPonly alt2_proportion_SNPonly sample~chr1~1107734~A~C 143,142,138,114,110,109,99,87,83,72,65,61,40,38,37,17,7, 149,146,131,126,109,78,18,16,12,11,10, 1107590,1107591,1107595,1107619,1107623,1107624,1107634,1107646,1107650,1107661,1107668,1107672,1107693,1107695,1107696,1107716,1107726, 1107584,1107587,1107602,1107607,1107624,1107655,1107715,1107717,1107721,1107722,1107723, , , 60,60,56,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60,60,60,60,60,60,60, 30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30, 30,30,30,30,30,30,30,30,30,30,30, , , 0 0 0 0 AAC GTT 0 0 , , 0 0 0 0 30.714285714285715 36.0 0 0.25 SNP 0 0.6666666666666666 0.0 0.0 0.0 0.0 sample~chr1~1894606~G~C 143,132,130,118,117,116,113,109,109,103,102,102,100,69,68,52,47,20,5,0, 148,138,122,112,109,105,102,34,21,14,7,4, 1894462,1894473,1894475,1894487,1894488,1894489,1894492,1894496,1894496,1894502,1894503,1894503,1894505,1894536,1894537,1894553,1894558,1894585,1894600,1894605, 1894457,1894467,1894483,1894493,1894496,1894500,1894503,1894571,1894584,1894591,1894598,1894601, , , 60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60,60,60,60,60,60,60,60, 30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30, 60,30,30,30,30,30,30,30,30,30,30,30, , , 0 0 0 0 AGC GCT 0 0 , , 0 0 0 0 36.142857142857146 39.375 0 1.0 SNP 0 0.7142857142857143 0.0 0.0 0.0 0.0 sample~chr1~1968440~A~C 143,142,134,132,128,117,110,86,82,79,78,72,57,50,45,40,24,19, 148,147,140,91,69, 1968296,1968297,1968305,1968307,1968311,1968322,1968329,1968353,1968357,1968360,1968361,1968367,1968382,1968389,1968394,1968399,1968415,1968420, 1968291,1968292,1968299,1968348,1968370, , , 60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60, 30,16,30,30,30,30,30,30,30,30,30,20,30,30,30,30,30,30, 20,20,20,20,20, , , 0 0 0 0 CAT ATG 0 0 , , 0 0 0 0 30.428571428571427 35.125 0 1.0 SNP 0 0.6190476190476191 0.0 0.0 0.0 0.0
I'm trying to figure out where the bug is, and I found that after these lines, each of them individually, causes the 'df' dataframe to become empty. That is the source of the error, but I'm not sure why this is happening.
df = df[df.seqpos_minor != ','] df = df[df.seqpos_major != ','] df = df[df.baseq_minor_near1b != ','] df = df[df.baseq_major_near1b != ',']
Hi,
Thanks for your interest in MosaicForecast! Have you checked the format of your input.bed, does it start with "chr" as hg38? and "input/k24.umap.sorted.bw" is formated with hg19.
Best,
Yanmei
Hi! Yes input.bed and k24.umap.sorted.bw are both from hg38 with "chr#" notation for chromosomes. So that cannot be the problem.
What else could be the issue?
$ head input.bed
chr1 2384860 2384861 C T sample
chr1 5960549 5960550 A G sample
chr1 8068981 8068982 A C sample
chr1 20021374 20021375 A G sample
chr1 34866510 34866511 G A sample
chr1 39823543 39823544 T A sample
chr1 40907253 40907254 C T sample
$ wiggletools write_bg - k24.umap.sorted.bw | head
chr1 10157 10158 0.000000
chr1 10158 10159 0.042000
chr1 10159 10160 0.042000
chr1 10160 10161 0.042000
chr1 10161 10162 0.042000
chr1 10162 10163 0.042000
chr1 10163 10164 0.042000
chr1 10164 10165 0.042000
Hi, I'm getting this error from this command. How do I fix this? Note: this seems to be the same as #10, but my command was correct, so that cannot be the explanation. Thanks! Docker: yanmei/mosaicforecast:0.0.1
python ReadLevel_Features_extraction.py input.bed sample.features bam_dir input/Homo_sapiens_assembly38.fasta input/k24.umap.sorted.bw 2 bam
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set loc = self.items.get_loc(item) File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "ReadLevel_Features_extraction.py", line 984, in <module> df['querypos_p']=df.apply(lambda row: my_wilcox_pvalue(row['querypos_major'], row['querypos_minor']), axis=1) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__ self._set_item(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item NDFrame._set_item(self, key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item self._data.set(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set self.insert(len(self.items), item, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block return klass(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in __init__ super().__init__(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in __init__ f"Wrong number of items passed {len(self.values)}, " ValueError: Wrong number of items passed 38, placement implies 1
First few lines of tmp file output:
id querypos_major querypos_minor leftpos_major leftpos_minor seqpos_major seqpos_minor mapq_major mapq_minor baseq_major baseq_minor baseq_major_near1b baseq_minor_near1b major_plus major_minus minor_plus minor_minus context1 context2 context1_count context2_count mismatches_major mismatches_minor major_read1 major_read2 minor_read1 minor_read2 dp_near dp_far conflict_num mappability type length GCcontent ref_softclip alt_softclip indel_proportion_SNPonly alt2_proportion_SNPonly sample~chr1~1107734~A~C 143,142,138,114,110,109,99,87,83,72,65,61,40,38,37,17,7, 149,146,131,126,109,78,18,16,12,11,10, 1107590,1107591,1107595,1107619,1107623,1107624,1107634,1107646,1107650,1107661,1107668,1107672,1107693,1107695,1107696,1107716,1107726, 1107584,1107587,1107602,1107607,1107624,1107655,1107715,1107717,1107721,1107722,1107723, , , 60,60,56,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60,60,60,60,60,60,60, 30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30, 30,30,30,30,30,30,30,30,30,30,30, , , 0 0 0 0 AAC GTT 0 0 , , 0 0 0 0 30.714285714285715 36.0 0 0.25 SNP 0 0.6666666666666666 0.0 0.0 0.0 0.0 sample~chr1~1894606~G~C 143,132,130,118,117,116,113,109,109,103,102,102,100,69,68,52,47,20,5,0, 148,138,122,112,109,105,102,34,21,14,7,4, 1894462,1894473,1894475,1894487,1894488,1894489,1894492,1894496,1894496,1894502,1894503,1894503,1894505,1894536,1894537,1894553,1894558,1894585,1894600,1894605, 1894457,1894467,1894483,1894493,1894496,1894500,1894503,1894571,1894584,1894591,1894598,1894601, , , 60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60,60,60,60,60,60,60,60, 30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30, 60,30,30,30,30,30,30,30,30,30,30,30, , , 0 0 0 0 AGC GCT 0 0 , , 0 0 0 0 36.142857142857146 39.375 0 1.0 SNP 0 0.7142857142857143 0.0 0.0 0.0 0.0 sample~chr1~1968440~A~C 143,142,134,132,128,117,110,86,82,79,78,72,57,50,45,40,24,19, 148,147,140,91,69, 1968296,1968297,1968305,1968307,1968311,1968322,1968329,1968353,1968357,1968360,1968361,1968367,1968382,1968389,1968394,1968399,1968415,1968420, 1968291,1968292,1968299,1968348,1968370, , , 60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60, 60,60,60,60,60, 30,16,30,30,30,30,30,30,30,30,30,20,30,30,30,30,30,30, 20,20,20,20,20, , , 0 0 0 0 CAT ATG 0 0 , , 0 0 0 0 30.428571428571427 35.125 0 1.0 SNP 0 0.6190476190476191 0.0 0.0 0.0 0.0
I'm trying to figure out where the bug is, and I found that after these lines, each of them individually, causes the 'df' dataframe to become empty. That is the source of the error, but I'm not sure why this is happening.
df = df[df.seqpos_minor != ','] df = df[df.seqpos_major != ','] df = df[df.baseq_minor_near1b != ','] df = df[df.baseq_major_near1b != ',']
Hi,
Thanks for your interest in MosaicForecast! Have you checked the format of your input.bed, does it start with "chr" as hg38? and "input/k24.umap.sorted.bw" is formated with hg19.
Best,
Yanmei
Hi! Yes input.bed and k24.umap.sorted.bw are both from hg38 with "chr#" notation for chromosomes. So that cannot be the problem.
What else could be the issue?
$ head input.bed chr1 2384860 2384861 C T sample chr1 5960549 5960550 A G sample chr1 8068981 8068982 A C sample chr1 20021374 20021375 A G sample chr1 34866510 34866511 G A sample chr1 39823543 39823544 T A sample chr1 40907253 40907254 C T sample
$ wiggletools write_bg - k24.umap.sorted.bw | head chr1 10157 10158 0.000000 chr1 10158 10159 0.042000 chr1 10159 10160 0.042000 chr1 10160 10161 0.042000 chr1 10161 10162 0.042000 chr1 10162 10163 0.042000 chr1 10163 10164 0.042000 chr1 10164 10165 0.042000
Hi @gevro ,
Could you send me a slice of your bam file so I could test it?
Thanks!
Yanmei
Hi! Yes input.bed and k24.umap.sorted.bw are both from hg38 with "chr#" notation for chromosomes. So that cannot be the problem.
What else could be the issue?
$ head input.bed chr1 2384860 2384861 C T sample chr1 5960549 5960550 A G sample chr1 8068981 8068982 A C sample chr1 20021374 20021375 A G sample chr1 34866510 34866511 G A sample chr1 39823543 39823544 T A sample chr1 40907253 40907254 C T sample
$ wiggletools write_bg - k24.umap.sorted.bw | head chr1 10157 10158 0.000000 chr1 10158 10159 0.042000 chr1 10159 10160 0.042000 chr1 10160 10161 0.042000 chr1 10161 10162 0.042000 chr1 10162 10163 0.042000 chr1 10163 10164 0.042000 chr1 10164 10165 0.042000
hi @gevro ,
One possible reason is that MosaicForecast now take paired end reads... Is it possible that your reads are single-end reads? If yes, I could modify a version for you to use.
Best,
Yanmei
Hi, Sorry for the late reply. I found that my BAM files did not have the 'NM' tag. Could that be the issue? I am now making BAM files with the 'NM' tag to see if that will work.
Hi, Sorry for the late reply. I found that my BAM files did not have the 'NM' tag. Could that be the issue? I am now making BAM files with the 'NM' tag to see if that will work.
Hi @gevro ,
If seqpos_minor and seqpos_major returns blank results, it's most probably because all of the reads are not proper paired, because the criteria to calculate seqpos is that "pileupread.alignment.is_proper_pair".
NM is also a tag that I used. Sorry for the inconvenience caused. Maybe I should reinvent the wheels instead of assuming all bam files are in the same format...
Best wishes,
Yanmei
I checked and most alignments are proper paired, so that cannot be the issue.
Confirmed that a BAM file with the "NM" tag solves the problem.
Confirmed that a BAM file with the "NM" tag solves the problem.
Great! Thanks for notifying! :)
Hi, I'm getting this error from this command. How do I fix this? Note: this seems to be the same as #10, but my command was correct, so that cannot be the explanation. Thanks!
Docker: yanmei/mosaicforecast:0.0.1
First few lines of tmp file output:
I'm trying to figure out where the bug is, and I found that after these lines, each of them individually, causes the 'df' dataframe to become empty. That is the source of the error, but I'm not sure why this is happening.