Michael A. Stravs (1), Kai Dührkop (2), Sebastian Böcker (2), Nicola Zamboni (1)
1 Institute of Molecular Systems Biology, ETH Zürich, CH-8092 Zürich, Switzerland
2 Institut für Informatik, Friedrich-Schiller-Universität Jena, D-07743 Jena, Germany
stravs@imsb.biol.ethz.ch
submitted, bioRxiv: https://www.biorxiv.org/content/10.1101/2021.07.06.450875v1
https://bio.informatik.uni-jena.de/software/sirius/
The published version of MSNovelist relied on an old version of SIRIUS for which the backend is not running anymore. For a long time, this left users with no possibility to try out MSNovelist. However, finally, with the release of SIRIUS 6, MSNovelist was integrated into SIRIUS. You can now use MSNovelist de novo structure suggestions directly with the SIRIUS GUI (and also the new API provided in service mode.)
The repository here is mostly what's running there on the backend, plus some API stuff in front that I didn't write, and was retrained on the new data from SIRIUS 6. Unfortunately, this means you cannot use this repo directly, unless you want to dig into retraining it with different fingerprint data from a different fingerprint prediction system. The mist branch (also merged here) contains some work on getting MSNovelist to run with predicted Morgan 4096-bit fingerprints, but we didn't get terribly far with it yet.
MSNovelist is provided as a Docker container for end users. This requires a working Docker installation on Windows or Linux; on the other hand, no other dependencies are required, the Docker container packages all required software and data.
To install Docker on Windows, Linux, or Mac, follow the instructions on https://docs.docker.com/get-docker/.
Notes:
After verifying that you have a running Docker installation, pull the latest MSNovelist container:
docker pull stravsm/msnovelist
Alternatively, you can build the container yourself. For this, checkout the Git repository or
download the zipped repository
From the repository (the directory containing Dockerfile
), run docker build -t msnovelist .
No dependencies except for Docker itself are required. If you build the container on Windows,
make sure that the Git repository was checked out with core.autocrlf=false
(or use the zip file).
MSNovelist can be run as a command-line tool or with a simple Web interface (see below).
docker run -it --init -p 8050:8050 stravsm/msnovelist webui.sh
docker run -it --init -p 8050:8050 msnovelist webui.sh
docker run -d -p 8050:8050 stravsm/msnovelist webui.sh
docker kill
with the docker ID found with docker ps
.General:
docker run -v $DATAFOLDER:/msnovelist-data msnovelist predict.sh SPECTRA SIRIUS_SETTINGS
DATAFOLDER
is a folder that contains at least the spectra to be processed.SPECTRA
is a file within DATAFOLDER
, it is first processed with SIRIUS. This works with *.mgf
and *.ms
(SIRIUS format) files.SIRIUS_SETTINGS
is optional; by default, the settings are formula -p qtof structure -d ALL_BUT_INSILICO
.RUNID
(based on the timestamp when running the script) identifies the processing results.DATAFOLDER/sirius-RUNID
and used as input for MSNovelist.SPECTRA
is a folder, it is assumed to be a pre-processed SIRIUS 4.4.29 workspace and used directly as input for MSNovelist
DATAFOLDER/fingerprint_cache.db
, it is used, otherwise a new cache is created at this pathDATAFOLDER/msnovelist-config-RUNID.yaml
.$DATAFOLDER/results-RUNID/decode-RUNID.csv
and .pkl
.Example:
377.mgf
from the directory sample-data
of this repository.377.mgf
, run docker run --init -v "$(pwd)":/msnovelist-data msnovelist predict.sh 377.mgf
${pwd}
instead. Alternatively, on either Win or Linux, use the full path.)bryophytes.mgf
, the complete bryophyte dataset (576 total spectra). For this, at least 16GB of RAM are suggested. Runtime is approx. 2h on a laptop with 4 cores.score_mod_platt
, descendingly, to get the top candidate (or filter by rank_score_lim_mod_platt == 1
)query
in the result file indicates the spectrum associated with the resultSee above: A Docker system able to run Linux Docker containers is required. The Docker container contains all dependencies required to run the software. The container was built and tested on Docker 19.03.6, Ubuntu 18.04.4 LTS, with 16 GB RAM; Docker 19.03.8 on Ubuntu 20.04.2 LTS, with 32 GB RAM; Docker Desktop 2.3.0.4 (46911; engine 19.03.12) on Windows 10.0.10942 with 16 GB RAM; and Docker Desktop 4.1.1 (engine v20.10.8) on Windows 10 20H2 (19042.2037). The Docker image requires approx. 6.5 GB of disk space. Build time for the Docker container is up to 20 min. Runtime with a single spectrum is <5 min; for 50 spectra, approx. 30 min on a laptop with 4 cores; / 32GB RAM; for the complete bryophyte dataset, approx. 2:30 h on a machine with 4 cores / 32 GB RAM.