timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
112 stars 24 forks source link

python #34

Closed leonvgurp closed 2 years ago

leonvgurp commented 2 years ago

Hi all,

I'm trying to run sinto on our local cluster. We have sinto v0.7.3.1 available with python 3.8.6. GCC v10.2.0 and OpenMPI v4.0.5 are loaded in the background. I use the following code to generate my fragments file:

sinto fragments -p 8 \
      -b /dir/file.bam \
      -f /dir/file.bed \
      --barcode_regex "[^:]*" \
      --use_chrom "*"

This generates the following output (with errors):

Function run_fragments called with the following arguments:

bam   /dir/file.bam
fragments   /dir/file.bed
min_mapq    30
nproc 8
barcodetag  CB
cells None
barcode_regex     [^:]*
use_chrom   *
max_distance      5000
min_distance      10
chunksize   500000
func  <function run_fragments at 0x2b3f2e8843a0>
Traceback (most recent call last):
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/bin/sinto", line 8, in <module>
    sys.exit(main())
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/lib/python3.8/site-packages/sinto/arguments.py", line 346, in main
    options.func(options)
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/lib/python3.8/site-packages/sinto/utils.py", line 21, in wrapper
    func(args)
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/lib/python3.8/site-packages/sinto/cli.py", line 45, in run_fragments
    fragments.fragments(
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/lib/python3.8/site-packages/sinto/fragments.py", line 470, in fragments
    chrom = utils.get_chromosomes(bam, keep_contigs=chromosomes)
  File "/opt/ebsofts/sinto/0.7.3.1-foss-2020b-Python-3.8.6/lib/python3.8/site-packages/sinto/utils.py", line 134, in get_chromosomes
    pattern = re.compile(keep_contigs)
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/opt/ebsofts/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/sre_parse.py", line 668, in _parse
    raise source.error("nothing to repeat",
re.error: nothing to repeat at position 0
srun: error: node245: task 0: Exited with exit code 1

We suspect this may be caused by the python version, but not sure. Could there be another reason why these errors are produced?

timoast commented 2 years ago

I think you just need to set --use_chrom "." rather than "*". If you run this:

import re
re.compile("*")

you should see a similar error

leonvgurp commented 2 years ago

This makes me feel very silly, but it indeed solves my problem. Thanks!

As a follow-up on this question, if I may. I'm not sure completely about the syntax to select the barcode regex. If I would, for example, want to use the 2nd column between colons instead of the first, what would be the instruction for that instead of "[^:]*"

Is this purely sed-based?

timoast commented 2 years ago

The string provided will be a regular expression used to detect the barcode. It's useful to test out a regular expression using https://regex101.com/ to find the expression that works for your format.