sanger-tol / genomenote

Nextflow DSL2 pipeline to generate a Genome Note, including assembly statistics, quality metrics, and Hi-C contact maps. This workflow is part of the Tree of Life production suite.
https://pipelines.tol.sanger.ac.uk/genomenote
MIT License
24 stars 6 forks source link

ERROR pipeline failure when fields missing from NCBI genome report [v1.2.0] #126

Closed tkchafin closed 3 months ago

tkchafin commented 4 months ago

Description of the bug

Key error from create_table.py:

Command error:
  Traceback (most recent call last):
    File "/lustre/scratch123/tol/teams/tolit/users/tolpipe/registered_workflows/prod/sanger-tol/genomenote-1.2.0/workflow/bin/create_table.py", line 182, in <module>
      sys.exit(main())
    File "/lustre/scratch123/tol/teams/tolit/users/tolpipe/registered_workflows/prod/sanger-tol/genomenote-1.2.0/workflow/bin/create_table.py", line 171, in main
      ncbi_stats(args.genome, args.sequence, writer)
    File "/lustre/scratch123/tol/teams/tolit/users/tolpipe/registered_workflows/prod/sanger-tol/genomenote-1.2.0/workflow/bin/create_table.py", line 81, in ncbi_stats
      writer.writerow(["Scaffolds", stats["number_of_scaffolds"]])
  KeyError: 'number_of_scaffolds'

Failure is on assembly GCA_963966575.1 which lacks several expected fields in assembly stats:

  "assemblyStats": {
    "contigL50": 32,
    "contigN50": 24519,
    "gcCount": "788905",
    "gcPercent": 38.0,
    "genomeCoverage": "46.0x",
    "numberOfComponentSequences": 87,
    "numberOfContigs": 87,
    "totalSequenceLength": "2065980",
    "totalUngappedLength": "2065980"
  },

Command used and terminal output

No response

Relevant files

No response

System information

No response