iqbal-lab-org / make_prg

Code to create a PRG from a Multiple Sequence Alignment file
Other
21 stars 7 forks source link

Ignore `N`s that might have slipped through when updating PRGs #58

Open leoisl opened 1 year ago

leoisl commented 1 year ago

When running the 4-way pipeline, updating the E coli PRG with illumina data, I got this error:

Traceback (most recent call last):
  File "/usr/local/bin/make_prg", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/make_prg/__main__.py", line 94, in main
    args.func(args)
  File "/usr/local/lib/python3.9/site-packages/make_prg/subcommands/update.py", line 182, in run
    denovo_variants_db = DenovoVariantsDB(
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 548, in __init__
    locus_name_to_denovo_loci = self._get_locus_name_to_denovo_loci()
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 532, in _get_locus_name_to_denovo_loci
    return self._get_locus_name_to_denovo_loci_core(filehandler)
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 522, in _get_locus_name_to_denovo_loci_core
    variants = self._read_variants(
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 495, in _read_variants
    denovo_variant = cls._read_DenovoVariant(
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 477, in _read_DenovoVariant
    denovo_variant = DenovoVariant(
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 41, in __init__
    DenovoVariant._param_checking(
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 59, in _param_checking
    DenovoVariant._check_sequence_is_composed_of_ACGT_only(alt)
  File "/usr/local/lib/python3.9/site-packages/make_prg/update/denovo_variants.py", line 85, in _check_sequence_is_composed_of_ACGT_only
    raise DenovoError(f"Found a non-ACGT seq ({seq}) in a denovo variant")
make_prg.update.denovo_variants.DenovoError: Found a non-ACGT seq (N) in a denovo variant

There are 36416 new variants found, and only a single one has N in it. I'd very much prefer to simply issue a warning here: https://github.com/iqbal-lab-org/make_prg/blob/c2c7ff0f40dabc5034e2bfad7d6c2229e5649b09/make_prg/update/denovo_variants.py#L85 than erroring out and not being able to update

leoisl commented 1 year ago

Done in https://github.com/iqbal-lab-org/make_prg/commit/46534bccf4c6a347ef17707747da3e6021134686, testing in 4way

mbhall88 commented 1 year ago

How do we get an N in a novel variant?

leoisl commented 1 year ago

For some reason racon output a N in a consensus sequence...

mbhall88 commented 1 year ago

Weird...