sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 81 forks source link

`sourmash tax prepare` sometimes fails to create db? #2211

Open bluegenes opened 2 years ago

bluegenes commented 2 years ago

I'm seeing some odd behavior, but not sure if this is server-related or sourmash related.

On a compute node, tax prepare will refuse to prepare a new lineages database, saying the table already exists. After running, a new file of that name exists but is empty.

sourmash tax prepare -t gtdb-rs207.taxonomy.csv -o gtdb-rs207.taxonomy.db

== This is sourmash version 4.4.4.dev2+g2f38f6c2. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loading taxonomies...
...loaded 317542 entries.
saving to 'gtdb-rs207.taxonomy.db', format sql...
ERROR while saving!
taxonomy table already exists in 'gtdb-rs207.taxonomy.db'

file: 512 -rw-r--r-- 1 ntpierce ctbrowngrp 0 Aug 15 15:14 gtdb-rs207.taxonomy.db

On the login node, running the exact same command seems to work, even without removing the empty file.

sourmash tax prepare -t gtdb-rs207.taxonomy.csv -o gtdb-rs207.taxonomy.db

== This is sourmash version 4.4.4.dev2+g2f38f6c2. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loading taxonomies...
...loaded 317542 entries.
saving to 'gtdb-rs207.taxonomy.db', format sql...
done!

file: 15M -rw-r--r-- 1 ntpierce ctbrowngrp 54M Aug 15 15:17 gtdb-rs207.taxonomy.db

Running the command again once the file is non-empty produces the error seen above.

ctb commented 2 years ago

I can't replicate this on farm nodes c6-58 or bm10 - but I believe it happens ;). I've seen weird things happen with sqlite on the nodes on our HPC. I think it has to do with locking, but that's just a guess.

Next time you run across this problem, could you record the specific node and filesystem you're on? thanks!