iprada / Circle-Map

A method for circular DNA detection based on probabilistic mapping of ultrashort reads
MIT License
62 stars 19 forks source link

Warnings in Circle-Map Realign tutorial #50

Open johnstonmj opened 4 years ago

johnstonmj commented 4 years ago

Hi Iñigo,

I was performing the Circle-Map Realign tutorial, and encountered a few warnings.

My command:

Circle-Map Realign -t 4 -i sort_circular_read_candidates.bam -qbam qname_unknown_circle.bam -sbam sorted_unknown_circle.bam -fasta $ref_genome -o my_unknown_circle.bed

Note that I am using the hg38-related reference GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna which differs from your tutorial slightly.

Output:

2020-09-17 11:16:14: Realigning reads using Circle-Map

2020-09-17 11:16:14: Clustering structural variant reads

2020-09-17 11:16:37: Splitting clusters to to processors

[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
400it [00:14, 27.54it/s]███████████████████████████████████████████████████████████████████████████ | 396/400 [00:14<00:00, 33.09it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [00:14<00:00, 27.54it/s]
2020-09-17 11:16:52: Writting final output to disk
2020-09-17 11:16:52: Finished!
2020-09-17 11:16:52: Circle-Map Realign finished indentifying circles in 0.6356727162996928

2020-09-17 11:16:52: Circle-Map has identified 1 circles

Computing the coverage of the identified eccDNA
Merging intervals for coverage computation
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
Computing coverage on interval chr7:143911106-143917553

The warning [E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam' Seems to make sense, as I don't think you can index a qname-sorted bam file?

The warnings:

/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)

seem to catch and correct themselves, but I am unsure what is causing them.

Is it safe to ignore these warnings? If so, can they be suppressed or explained in the stdout to avoid panicking me as the user? :-)

Thanks, Michael

johnstonmj commented 4 years ago

To expand upon this, when I run a larger human WGS sample I see additional warnings. Do these appear to be related to the cleanup of temporary directories? The program still completes its run and generates the *.circle.bed output file

Below is the end of my stderr

100%|█████████▉| 398/400 [13:11:04<03:22, 101.43s/it]
398it [13:11:03, 101.43s/it]
100%|█████████▉| 399/400 [13:11:30<01:18, 78.97s/it] 
399it [13:11:30, 78.97s/it] 
100%|██████████| 400/400 [13:18:04<00:00, 173.47s/it]
400it [13:18:04, 173.47s/it]
400it [13:18:04, 119.71s/it]
100%|██████████| 400/400 [13:18:04<00:00, 119.71s/it]Traceback (most recent call last):
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/managers.py", line 616, in _run_server
    server.serve_forever()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/managers.py", line 182, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 672, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 670, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs808d9db90000293e0000032c'

/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:151: RuntimeWarning: invalid value encountered in ulong_scalars
  end_coverage_ratio = np.sum(region_array[-self.ilen:]) / np.sum(ext_array[-(self.ilen + self.ext):])
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:233: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:194: RuntimeWarning: invalid value encountered in true_divide
  arrmean = um.true_divide(
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:149: RuntimeWarning: invalid value encountered in ulong_scalars
  start_coverage_ratio = np.sum(region_array[0:self.ilen]) / np.sum(
Traceback (most recent call last):
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/managers.py", line 616, in _run_server
    server.serve_forever()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/managers.py", line 182, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/multiprocessing/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 672, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/michael/.snakemake/conda/68e54810/lib/python3.8/shutil.py", line 670, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs808d9dd800002bfe0000032d'
iprada commented 4 years ago

Hi Michael,

I actually have never seen the majority of those warnings. Is your WGS sample public? so that I could reproduce the warnings?

I am ridicoulously busy right now, but I will have the time to look at this in October.

Best,

Iñigo

johnstonmj commented 4 years ago

Best of luck submitting your thesis, Iñigo!

Just documenting now while it is fresh in my mind, but feel free to wait until October. So far, the results seem to be unaffected, but I just wanted to ask about the warnings.

1 - Is the first warning expected? [E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam' In the tutorial, I don't believe we generate an index for this file.

2 - The second warning: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) Could simply be the result of a new warning being introduced in Python 3.8 as mentioned here. Your Readme mentions Python >= 3.6, but I am running 3.8.5, so perhaps this explains the difference.

3 - For the warnings of: OSError: [Errno 16] Device or resource busy: '.nfs808d9dd800002bfe0000032d' These seem to be linux directory locks. I suspect that your rmtree(tempdir) is trying to clear temporary directories and that I've got some file system lag that is causing this. Possibly not a Circle-Map issue. I'll let you know if it persists as I run more samples.

johnstonmj commented 4 years ago

Hi Iñigo,

Hope all is well for you!

Regarding point #3 in my previous message, the error OSError: [Errno 16] Device or resource busy appears to be caused by NFS file system latency interacting with python multiprocessing temp file cleanup.

My local server admins (and other online forums) have advised me that TMPDIR should NOT be on an NFS file system. Re-running my jobs using TMPDIR=/tmp (storage located on the compute node, not on the shared file system) solved this issue for me. For a 40X-coverage human genome, this required /tmp of at least 10 GB for me.

The warnings of: [E::idx_find_and_load] and RuntimeWarning: line buffering (buffering=1) isn't supported still occur, but the *.circle.bed file is generated successfully.

deanpettinga commented 3 years ago

Hi there, I've also experienced this issue when using the tutorial data for Circle-Map Realign

[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
Seems to make sense, as I don't think you can index a qname-sorted bam file?

I was unable to resolve the issue since I couldn't make an index for the name-sorted bam despite Circle-Map requesting it.