zanglab / SICER2

MIT License
20 stars 15 forks source link

Error due to score numpy dtype #24

Closed skchronicles closed 8 months ago

skchronicles commented 1 year ago

Hello there,

I hope you are having a great day, and that all is going well on your side! Thank you for creating and maintaining this awesome tool. SICER2 is an awesome broad peak caller.

While running SICER2, I ran into the following error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/app/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/envs/app/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/opt/conda/envs/app/lib/python3.7/site-packages/sicer/src/remove_redundant_reads.py", line 133, in find_and_filter_reads
    chrom_reads = match_by_chrom(path_to_file, chrom)  # Separates all reads by chromosome
  File "/opt/conda/envs/app/lib/python3.7/site-packages/sicer/src/remove_redundant_reads.py", line 120, in match_by_chrom
    processed_reads[i] = tuple(reads)
ValueError: invalid literal for int() with base 10: '.'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/app/bin/sicer", line 235, in <module>
    main()
  File "/opt/conda/envs/app/bin/sicer", line 229, in main
    run_SICER.main(args)
  File "/opt/conda/envs/app/lib/python3.7/site-packages/sicer/main/run_SICER.py", line 52, in main
    total_treatment_read_count = remove_redundant_reads.main(args, args.treatment_file, pool)
  File "/opt/conda/envs/app/lib/python3.7/site-packages/sicer/src/remove_redundant_reads.py", line 147, in main
    filtered_result = pool.map(find_and_filter_reads_partial, chroms)
  File "/opt/conda/envs/app/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/conda/envs/app/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: invalid literal for int() with base 10: '.'

After looking at the traceback, I was able to trace the error message to here: https://github.com/zanglab/SICER2/blob/15fdbf03d3477c5069efb2556956feee97fb990a/sicer/src/remove_redundant_reads.py#L109

My input BED files contains . characters in the score column of the file, which is causing a ValueError when the score dtype is set to np.int32.

Here is a preview of my input BED files:

$ head chip.bed input.bed 
==> chip.bed <==
chr17   63978   64078   .   .   +
chr17   63978   64078   .   .   +
chr17   64013   64113   .   .   +
chr17   64013   64113   .   .   +
chr17   64902   65002   .   .   +
chr17   64956   65056   .   .   +
chr17   65069   65169   .   .   +
chr17   65069   65169   .   .   +
chr17   77695   77795   .   .   +
chr17   113049  113149  .   .   +

==> input.bed <==
chr7    14095   14144   .   .   +
chr7    18473   18522   .   .   +
chr7    19652   19701   .   .   +
chr7    22065   22114   .   .   +
chr7    30581   30630   .   .   +
chr7    30601   30650   .   .   +
chr7    30635   30684   .   .   +
chr7    30655   30704   .   .   +
chr7    30655   30704   .   .   +
chr7    30663   30712   .   .   +

I was able to test/debug this from my side. I can reproduce and fix the error by changing the score dtype to an unicode string of length 6: image

For the time being, I have just edited my file to replace any . chars in the score column to 0. With that being said, I just wanted to confirm if that is okay. Are you internally using the information in the score column anywhere? I just want to confirm that setting it to 0 will not cause any unwanted side-effects.

Please let me know what you think.

Best regards, @skchronicles

skchronicles commented 1 year ago

Also, here is the command that was run if you want to try to reproduce the error on your side:

$ sicer -t chip.bed -c input.bed -s hg38 -rt 100 -w 300 -f 168 -egf 0.75 -g 600 -fdr 1E-2 -cpu 8 -o .
skchronicles commented 1 year ago

If you want, I can also submit a PR with the fix. Please let me know what you think, and have a great evening!

skchronicles commented 1 year ago

@zanglab @jinyongyoo I submitted a PR with a fix.

Please see: https://github.com/zanglab/SICER2/pull/25