ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
355 stars 39 forks source link

datasets summary crashes when called on a soon to be renamed genus Candida/Metschnikowiaceae #358

Closed Jtrachsel closed 4 months ago

Jtrachsel commented 4 months ago

Before opening an issue, please:

Describe the bug Error when calling datasets summary on a genus of soon to be renamed taxa, Candida/Metschnikowiaceae taxid = 2964429.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=0x1006c6c7c]

goroutine 1 [running]:
datasets_cli/v2/datasets.(*taxonAutosuggestApi).GetMatchingOrganisms(0x14000217c20, {0x1400042c020, 0x19}, 0x1, 0x1, {0x1006fce4e, 0x1c})
        apps/public/Datasets/v2/datasets/datasets/ResolveTaxons.go:227 +0x1ac
datasets_cli/v2/datasets.(*taxonAutosuggestApi).handleExactMatch(0x16fd6b526?, {0x1400041c110, 0x1400041c120, 0x0, 0x1400041c130, 0x1400041c140, 0x0}, {0x1006eb440, 0x6}, {0x16fd6b526?, ...}, ...)
        apps/public/Datasets/v2/datasets/datasets/ResolveTaxons.go:312 +0x1b4
datasets_cli/v2/datasets.(*taxonAutosuggestApi).GetOrganisms(0x0?, {0x16fd6b526?, 0x0?}, 0x0?, {0x1006fce4e, 0x1c}, {0x1006eb440, 0x6}, 0x0?, {0x0, ...})
        apps/public/Datasets/v2/datasets/datasets/ResolveTaxons.go:370 +0x18c
datasets_cli/v2/datasets.RetrieveTaxIdForTaxon({0x16fd6b526, 0x7}, 0x0?, {0x1006fce4e, 0x1c}, {0x1006eb440, 0x6}, {0x0, 0x0, 0x0})
        apps/public/Datasets/v2/datasets/datasets/ResolveTaxons.go:78 +0xa8
datasets_cli/v2/datasets.createSummaryGenomeTaxonCmd.func1(0x140002b4900?, {0x140002294f0, 0x1, 0x1?})
        apps/public/Datasets/v2/datasets/datasets/SummaryGenomeTaxon.go:33 +0x8c
github.com/spf13/cobra.(*Command).execute(0x140002b4900, {0x140002294d0, 0x1, 0x1})
        external/com_github_spf13_cobra/command.go:940 +0x60c
github.com/spf13/cobra.(*Command).ExecuteC(0x100f191a0)
        external/com_github_spf13_cobra/command.go:1068 +0x368
github.com/spf13/cobra.(*Command).Execute(...)
        external/com_github_spf13_cobra/command.go:992
datasets_cli/v2/datasets.Execute()
        apps/public/Datasets/v2/datasets/datasets/root.go:482 +0x2c
main.main()
        apps/public/Datasets/v2/cmd/datasets/main.go:10 +0x20

To Reproduce

datasets summary genome taxon 2964429

Expected behavior Output of summary data for genomes belonging to this genus

ericcox1 commented 4 months ago

Hi @Jtrachsel,

Thanks for opening this issue. I was able to reproduce this bug and we are going to investigate.

In the meantime, you can obtain the metadata for this taxid by using curl against the API:

curl -s https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/2964429/dataset_report
{"reports":[{"taxonomy":{"tax_id":2964429,"rank":"GENUS","current_scientific_name":{"name":"Candida/Metschnikowiaceae"},"group_name":"budding yeasts","classification":{"superkingdom":{"name":"Eukaryota","id":2759},"kingdom":{"name":"Fungi","id":4751},"phylum":{"name":"Ascomycota","id":4890},"class":{"name":"Saccharomycetes","id":4891},"order":{"name":"Saccharomycetales","id":4892},"family":{"name":"Metschnikowiaceae","id":27319},"genus":{"name":"Candida/Metschnikowiaceae","id":2964429}},"parents":[1,131567,2759,33154,4751,451864,4890,716545,147537,4891,4892,2916678,27319,2937349],"children":[2093215,1212667,655875,221909,46168,150221,85573,1323753,1415810,644823,746467,1276169,746466,46585,418784,1041604,1212665,46584,2599655,46252,657155,2599654,2997878,1142140,2546336,1170579,744973,1415807,1415806,45354,487108,1686162,45357,2594758,417300,564621,1041603,1253868,1231522,391827,933257,644821,1519030,220925,391824,432108,498019,2233643,255214,2546335,78172,535747],"counts":[{"type":"COUNT_TYPE_ASSEMBLY","count":251},{"type":"COUNT_TYPE_GENE","count":21562},{"type":"COUNT_TYPE_tRNA","count":627},{"type":"COUNT_TYPE_rRNA","count":32},{"type":"COUNT_TYPE_PROTEIN_CODING","count":20903}],"genomic_moltype":"dsDNA"},"query":["2964429"]}],"total_count":1}

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI eric.cox@nih.gov

ericcox1 commented 4 months ago

Hi @Jtrachsel,

This bug has been fixed in the latest version of the CLI, 16.17.1

For example:

datasets --version; datasets summary genome taxon 2964429 --as-json-lines | dataformat tsv genome --fields accession | head
datasets version: 16.17.1
Assembly Accession
GCA_030583085.1
GCA_030573135.1
GCF_003013715.1
GCF_001189475.1
GCF_002775015.1
GCA_001049995.1
GCA_001189475.1
GCA_002759435.3
GCA_002775015.1

Best, Eric