Closed turbomam closed 1 month ago
head -n 50 ncbi_metadata.samples_100.json
[{
"_id": {
"$oid": "66c60d3e7329fb22ad1c3d39"
},
"BioSample": {
"Status": {
"when": "2013-08-05T10:18:49",
"status": "live"
},
"Owner": {
"Contacts": {
"Contact": ""
},
"Name": {
"abbreviation": "WUGSC",
"content": "Washington University, Genome Sequencing Center"
}
},
"access": "public",
"Description": {
"Comment": {
"Paragraph": [
"Alistipes putredinis (GenBank Accession Number for 16S rDNA gene: L16497) is a member of the Bacteroidetes division of the domain bacteria and has been isolated from human feces. It has been found in 16S rDNA sequence-based enumerations of the colonic microbiota of adult humans (Eckburg et. al. (2005), Ley et. al. (2006)).",
"Keywords: GSC:MIxS;MIGS:6.0"
]
},
"Organism": {
"taxonomy_id": 445970,
"taxonomy_name": "Alistipes putredinis DSM 17216"
},
"Title": "Alistipes putredinis DSM 17216"
},
"Attributes": {
"Attribute": [
{
"attribute_name": "finishing strategy (depth of coverage)",
"content": "Level 3: Improved-High-Quality Draft11.6x;20"
},
It's not using duckdb, this is just a bug caused by assuming that --path
is set
Specifying --path db
(or your preferred path) should fix this
@iQuxLE - can we make this a bit more graceful - also there should be no need for individual CLI commands to do any of this kind of configuration, both duckdb and chomodb implementations conform to the same interface, so no need for switches (except perhaps initial instantiation)
This check was initially thought to prevent users from putting a directory path for a duckdb instance. Chroma would be fine with a directory path, but duckdb needs a file path. However we can adapt this within the implementations.
@cmungall
By default when using chromadb and not adding a path it will put the chroma.sqlite3
into /curate-gpt/db
The behaviour of the duckdb_adapter is currently not the same.
I will make a change so when setting no --path
it will redirect to /curate-gpt/db/foo.duckdb
.
If a user gives a directory bar
it will add the foo.duckdb
so there will be no DuckDB error like
duckdb.duckdb.IOException: IO Error: Could not read from file "~/code/curate-gpt/bar": Is a directory
and its automatically transformed to ~/code/curate-gpt/bar/foo.duckdb
This way there is no need for a -p
when indexing and the switch in the CLI can be deleted.
see snippet in next post