parklab / MosaicForecast

A mosaic detecting software based on phasing and random forest
MIT License
62 stars 21 forks source link

Failure to extract read features in MF #22

Closed bwzee closed 3 years ago

bwzee commented 3 years ago

Hi There

I installed MF and I am using cram input files. I make the input file for my variants of interest but during the first stage it gets terminated:


 python ReadLevel_Features_extraction.py inputs/P0003_T.input outputs/P0003_T.features  cram/  hg38.fa  ./hg38/k24.umap.wg.bw 2 cram

test_mf_local.sh: line 44:  2855 Terminated              python ReadLevel_Features_extraction.py inputs/$SAMPLENAME.input outputs/$SAMPLENAME.features $cramdir ${genomeref} ${k24bw} 2 cram
not enough alt reads:  chr1 2304920 2304921
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

The 'not enough alt reads' seems to come up quite frequently. What could I be missing with my dataset.

Note that I am not running mutect prior to running MF. I am starting with a list of variant positions for which I want to work with.

Many thanks

douym commented 3 years ago

Hi There

I installed MF and I am using cram input files. I make the input file for my variants of interest but during the first stage it gets terminated:


 python ReadLevel_Features_extraction.py inputs/P0003_T.input outputs/P0003_T.features  cram/  hg38.fa  ./hg38/k24.umap.wg.bw 2 cram

test_mf_local.sh: line 44:  2855 Terminated              python ReadLevel_Features_extraction.py inputs/$SAMPLENAME.input outputs/$SAMPLENAME.features $cramdir ${genomeref} ${k24bw} 2 cram
not enough alt reads:  chr1 2304920 2304921
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/queues.py", line 347, in put
    self._writer.send_bytes(obj)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "miniconda3/envs/MF/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

The 'not enough alt reads' seems to come up quite frequently. What could I be missing with my dataset.

Note that I am not running mutect prior to running MF. I am starting with a list of variant positions for which I want to work with.

Many thanks

Hi @bwzee ,

Have you tried running the demo? I'm not getting problems running the demo and others have succeeded running it:

python ReadLevel_Features_extraction.py demo/test.input demo/test.cram.features demo/ {fasta} {k24.umap.wg.bw} 1 cram

I ask because from your error, it seems your conda environment has some problems.

As for the "not enough alt reads:" issue, you could just ignore it, because whenever there are not enough reads for a site to calculate the p-values (for example, only 1 read for the alt allele or so) MF would report it.

Best wishes,

Yanmei

bwzee commented 3 years ago

Thanks for the reply. Actually Im sorry I didnt mention this but the example data worked fine for me. Since I'm using hg38 I had to fix some of the reference downloads and bigwig creation, and then once that was all done I ran the test example and it worked fine.

However when I ran it on my own set of data it failed.

Exactly what sort of problem could there be with my conda environment ? For privacy concerns I sanitized file paths.

Thanks

douym commented 3 years ago

Thanks for the reply. Actually Im sorry I didnt mention this but the example data worked fine for me. Since I'm using hg38 I had to fix some of the reference downloads and bigwig creation, and then once that was all done I ran the test example and it worked fine.

However when I ran it on my own set of data it failed.

Exactly what sort of problem could there be with my conda environment ? For privacy concerns I sanitized file paths.

Thanks

Hi @bwzee ,

Thanks for your reply! Have you tried running multi-thread mode using the demo and did it run well? it seems conda failed to use multiprocessing in your case.

Best wishes,

Yanmei

freezecoder commented 3 years ago

Hi

I actually ran this in multithreaded mode and it still failed. There doesn't seem to be an issue with the actual multitheading python library itself because other programs can use this package and run without any issues.