parklab / MosaicForecast

A mosaic detecting software based on phasing and random forest
MIT License
62 stars 21 forks source link

Docker ReadLevel_Features_extraction.py fails #10

Closed cccnrc closed 4 years ago

cccnrc commented 4 years ago

I've installed the Docker image following your instructions and everything worked perfectly fine. I kept following your instructions being able to run the demo Phase.py that worked properly. I then tried to run the ReadLevel_Features_extraction.py example but it fails with the following error:

python ReadLevel_Features_extraction.py /MF/demo/test.input /MF/demo/test.features /MF/demo hs37d5.fa /root/downloads/hg19/k24.umap.wg.bw 150 2 not enough alt reads: 11 40316579 40316580 not enough alt reads: 15 75918043 75918044 not enough alt reads: 1 1004864 1004865 not enough alt reads: 12 52644507 52644508 not enough alt reads: 1 40130162 40130163 not enough alt reads: 1 33801575 33801576 not enough alt reads: 1 32160980 32160981 not enough alt reads: 1 2591768 2591769 not enough alt reads: 1 36036837 36036838 Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set loc = self.items.get_loc(item) File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "ReadLevel_Features_extraction.py", line 984, in df['querypos_p']=df.apply(lambda row: my_wilcox_pvalue(row['querypos_major'], row['querypos_minor']), axis=1) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in setitem self._set_item(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item NDFrame._set_item(self, key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item self._data.set(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set self.insert(len(self.items), item, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block return klass(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in init super().init(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in init f"Wrong number of items passed {len(self.values)}, " ValueError: Wrong number of items passed 38, placement implies 1

I copied the demo/test.bam and demo/test.bam.bai to demo/sample.bam and demo/sample.bam.bai cause I saw that was the requested name and I also tested I can run bigWigAverageOverBed

The script creates a test.features.tmp file with only column names and then fails.

WHat shall I do to make it run properly?

Thank you in advance for any help

douym commented 4 years ago

I've installed the Docker image following your instructions and everything worked perfectly fine. I kept following your instructions being able to run the demo Phase.py that worked properly. I then tried to run the ReadLevel_Features_extraction.py example but it fails with the following error:

python ReadLevel_Features_extraction.py /MF/demo/test.input /MF/demo/test.features /MF/demo hs37d5.fa /root/downloads/hg19/k24.umap.wg.bw 150 2 not enough alt reads: 11 40316579 40316580 not enough alt reads: 15 75918043 75918044 not enough alt reads: 1 1004864 1004865 not enough alt reads: 12 52644507 52644508 not enough alt reads: 1 40130162 40130163 not enough alt reads: 1 33801575 33801576 not enough alt reads: 1 32160980 32160981 not enough alt reads: 1 2591768 2591769 not enough alt reads: 1 36036837 36036838 Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set loc = self.items.get_loc(item) File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'querypos_p' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "ReadLevel_Features_extraction.py", line 984, in df['querypos_p']=df.apply(lambda row: my_wilcox_pvalue(row['querypos_major'], row['querypos_minor']), axis=1) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in setitem self._set_item(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item NDFrame._set_item(self, key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item self._data.set(key, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set self.insert(len(self.items), item, value) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block return klass(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in init super().init(values, ndim=ndim, placement=placement) File "/usr/local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in init f"Wrong number of items passed {len(self.values)}, " ValueError: Wrong number of items passed 38, placement implies 1

I copied the demo/test.bam and demo/test.bam.bai to demo/sample.bam and demo/sample.bam.bai cause I saw that was the requested name and I also tested I can run bigWigAverageOverBed

The script creates a test.features.tmp file with only column names and then fails.

WHat shall I do to make it run properly?

Thank you in advance for any help

Hi @cccnrc ,

Thanks for using MF! May I ask if you use GRCh37 or GRCh38? Recently I updated MF and it now can count the read length automatically and the command changed to like this, I think the last two parameters in your command are not correct:

python(v3) ReadLevel_Features_extraction.py input_bed(file_format: chr pos-1 pos ref alt sample, sep="\t") output_features bam_dir(cram is also supported) reference_fasta Umap_mappability(bigWig file,k=24) num_threads_parallel sequencing_file_format(bam/cram)

Could yo try to run the command like below? Thanks!

python ReadLevel_Features_extraction.py /MF/demo/test.input /MF/demo/test.features /MF/demo hs37d5.fa /root/downloads/hg19/k24.umap.wg.bw 2 bam

(the 2nd last parameter is the thread number, and the last parameter indicate the file format of the sequencing file)

And if you use GRCh38, I haven't pushed the new docker image, I'll upgrade now and it should be available in about an hour.

Thanks, and best wishes,

Yanmei

cccnrc commented 4 years ago

Oh my god I feel so stupid, sorry to bother. I am in GRCh37 so this was no the problem, I copy-pasted part of the command from previous. So sorry to have bothered you @douym and thanks for your kind reply! With the right options it seems to work, I think the warning is normal due to the limited size of example data, right?

python ReadLevel_Features_extraction.py /MF/demo/test.input /MF/demo/test.features /MF/demo hs37d5.fa /root/downloads/hg19/k24.umap.wg.bw 2 bam

test~1~1004865~G~C -1 1 1004850
/usr/local/lib/python3.6/site-packages/scipy/stats/morestats.py:2778: UserWarning: Warning: sample size too small for normal approximation.
  warnings.warn("Warning: sample size too small for normal approximation.")
douym commented 4 years ago

Oh my god I feel so stupid, sorry to bother. I am in GRCh37 so this was no the problem, I copy-pasted part of the command from previous. So sorry to have bothered you @douym and thanks for your kind reply! With the right options it seems to work, I think the warning is normal due to the limited size of example data, right?

python ReadLevel_Features_extraction.py /MF/demo/test.input /MF/demo/test.features /MF/demo hs37d5.fa /root/downloads/hg19/k24.umap.wg.bw 2 bam

test~1~1004865~G~C -1 1 1004850
/usr/local/lib/python3.6/site-packages/scipy/stats/morestats.py:2778: UserWarning: Warning: sample size too small for normal approximation.
  warnings.warn("Warning: sample size too small for normal approximation.")

Hi @cccnrc ,

No bother at all! Thanks again for picking out the problem in the README.md, I've updated the old command for the demo run. Yes, please do not worry about the warning.

Best wishes,

Yanmei