sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
84 stars 20 forks source link

Could not load Custom Database ... DB seems to be corrupted and should be deleted and re-imported #44

Closed gjgetzinger closed 3 years ago

gjgetzinger commented 3 years ago

I am experiencing a database error when trying to implement a custom database search.

The attached test fails with the following error:

SEVERE  11:40:46 - Could not load Custom Database 'db3'. DB seems to be corrupted and should be deleted and re-imported
java.io.IOException: Custom database 'db3' not found.
    at de.unijena.bioinf.chemdb.RestWithCustomDatabase.getCustomDb(RestWithCustomDatabase.java:209)
    at de.unijena.bioinf.chemdb.RestWithCustomDatabase.loadCompoundsByFormula(RestWithCustomDatabase.java:189)
    at de.unijena.bioinf.fingerid.FormulaJob.lambda$compute$0(FormulaJob.java:57)
    at de.unijena.bioinf.utils.NetUtils.tryAndWait(NetUtils.java:72)
    at de.unijena.bioinf.utils.NetUtils.tryAndWait(NetUtils.java:64)
    at de.unijena.bioinf.fingerid.FormulaJob.compute(FormulaJob.java:56)
    at de.unijena.bioinf.fingerid.FormulaJob.compute(FormulaJob.java:36)
    at de.unijena.bioinf.jjobs.BasicJJob.call(BasicJJob.java:120)
    at de.unijena.bioinf.jjobs.BatchJJob.compute(BatchJJob.java:28)
    at de.unijena.bioinf.jjobs.BatchJJob.compute(BatchJJob.java:9)
    at de.unijena.bioinf.jjobs.BasicJJob.call(BasicJJob.java:120)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

#!/usr/bin/env bash

# This test works 
mkdir test1
cd test1
echo "CN1CCCC1C2=C[N+](=CC=C2)[O-]  id-01   Nicotin
CN1C=NC2=C1C(=O)N(C(=O)N2C)C    id-03   Caffein
CN1CCC2=CC3=C(C=C2C1C4C5=C(C6=C(C=C5)OCO6)C(=O)O4)OCO3 id-05 Bicculine" > candidates1.tsv

echo ">compound test1
>formula C8H10N4O2
>ionization [M+H]+
>parentmass 195.0877
>ms2
138.0661 3.693034
195.0877 100.000000" > test1.ms

sirius -i candidates1.tsv custom-db --output . --name db1
sirius -i test1.ms -o sirius_rst1 formula structure --database ./db1

cd ..

# ...this test also works 
mkdir test2
cp test1/test1.ms test2/test2.ms
cd test2

echo "NC(=O)c1ccc([N+](=O)[O-])cc1  ZESWUEBPRPGMTP  name1
NC(=O)c1cccc([N+](=O)[O-])c1    KWAYEPXDGHYGRW  name2
NC(=O)c1ccccc1[N+](=O)[O-]  KLGQWSOYKYFBTR  name3
COc1ccc2no[n+]([O-])c2c1    XCWFKHHSXPIDHN  name4
O=[N+]([O-])c1ccc(C=NO)cc1  WTLPAVBACRIHHC  name5
O=[N+]([O-])c1cccc(C=NO)c1  GQMMRLBWXCGBEV  name6
O=CNc1ccc([N+](=O)[O-])cc1  ZTCQFVRINYOPOH  name7" > candidates2.tsv

sirius -i candidates2.tsv custom-db --output . --name db2
sirius -i test2.ms -o sirius_rst2 formula structure --database ./db2

cd ..

# ...but this one fails 
mkdir test3
cp ./test2/candidates2.tsv ./test3/candidates3.tsv
cd test3
echo ">compound test3
>formula C7H6N2O3
>ionization [M+H]+
>parentmass 167.0451
>ms2 
  50.0153   332681.1
  51.0232  1177739.8
  52.0184    26185.3
   76.031    22813.9
  78.0338  1824839.5
  81.0335    18824.8
  92.0257    74763.5
  96.0444    41429.5
 120.0205    23742.8
 150.0186   14099669" > test3.ms

sirius -i candidates3.tsv custom-db --output . --name db3
sirius -i test3.ms -o sirius_rst3 formula structure --database ./db3
mfleisch commented 3 years ago

Thanks for reporting such a detailed and easy to test issue!

I have not reproduced the problem with older releases but we fixed a problem with the directory path resolution of custom DBs in v4.9.2 and your script now completes successfully.