ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
349 stars 39 forks source link

Additional space in column name when asking for the field "Assembly BioSample Strain" #388

Open greenmna opened 1 month ago

greenmna commented 1 month ago

Describe the bug Downloading genome information on the Assembly BioSample Strain yields a column name containing an extra whitespace

To Reproduce datasets summary genome accession GCF_000196515.1 --as-json-lines --assembly-level complete --assembly-source RefSeq | dataformat tsv genome --fields assminfo-biosample-strain > my_report.tsv

Steps to reproduce the behavior:

  1. Use datasets to download a summary of any number of given genomes
  2. Use dataformat to output a tsv file for genome information, specifically declaring the field assminfo-biosample-strain
  3. Direct the output to a tsv file
  4. Open the tsv file, go to the column named "Assembly BioSample Strain", double-click and use Ctrl + A to see the extra white space proceeding the word "Strain"

Expected behavior It was expected to be a string with no extra white space at the end like so: "Assembly BioSample Strain" What is instead occurring is the following: "Assembly BioSample Strain ".

I'm not sure if the way I'm providing arguments is the issue, but I just updated to 16.24.0 and it is still present in this column. As far as I'm aware, though I've not exhaustively checked, I've had no other issue with other column names.

Thank you!

ericcox1 commented 1 month ago

Hi @greenmna,

Thanks for opening this issue. We are going to investigate and I will comment on this thread with any updates.

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI eric.cox@nih.gov