sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
78 stars 17 forks source link

Definition of the candidate fingerprint binary format #28

Closed bachi55 closed 2 years ago

bachi55 commented 3 years ago

Hi,

is there a documentation how the fingerprint binary files (e.g. "C8H18O5_[M+Na]+.fps") for the molecular candidates in the "fingerid" sub-directory of the SIRIUS workspace can loaded? What is there binary encoding?

Best regards,

Eric

jackhu-bme commented 1 year ago

Same question as you/

kaibioinfo commented 1 year ago

it's a sparse array, every 2 bytes encode the index of a set position, big endian. A -1 encodes the end of a candidate fingerprint.

jackhu-bme commented 1 year ago

Thanks for your reply! @kaibioinfo This one worked for my SIRIUS4.4 Linux version output workspace for some of my past results, I sincerely thank you for this! However, as my previous install using SIRIUS in the docker image provided by repo https://github.com/meowcat/MSNovelist encountered the issue of https://github.com/meowcat/MSNovelist/issues/9, I switched to GUI install of SIRIUS 5.5.1 on windows. From the issue, I heard that the server is down, but the GUI install is still fine and could produce probability fingerprints of mgf files, which are essential to my projects. However, the GUI install could not provide the output of all the fingerprints(I see only a small part of them could be exported using the summary option), and there is another binary file "fingerprint" when I click "save as" and save the "SIRIUS projects". When my input file is named A.ms, the path of the binary file is "${directory_i_clicked}/A/fingerprints". This binary file takes disk space of 38KB for my example and is likely to have all the info I needed for fingerprints of A.ms. However, I could not parse it using python, as I can not guess the format and how it is encoded. Could you please give me some suggestions when you are available? Like what do the bytes encode? Thanks a lot for your reply. As these fingerprints are closely related to my bachelor graduation project(ddl is in May), this is essential and I can hardly wait till the back online of the SIRIUS server as the date is still unknown to me.

jackhu-bme commented 1 year ago

Well, it turns out to be something easy. Just rename the binary file to "fingerprint.zip" and decompress it, then the .fps file which is a text file turns out to be fingerprints needed.

Thanks for your reply! @kaibioinfo This one worked for my SIRIUS4.4 Linux version output workspace for some of my past results, I sincerely thank you for this! However, as my previous install using SIRIUS in the docker image provided by repo https://github.com/meowcat/MSNovelist encountered the issue of meowcat/MSNovelist#9, I switched to GUI install of SIRIUS 5.5.1 on windows. From the issue, I heard that the server is down, but the GUI install is still fine and could produce probability fingerprints of mgf files, which are essential to my projects. However, the GUI install could not provide the output of all the fingerprints(I see only a small part of them could be exported using the summary option), and there is another binary file "fingerprint" when I click "save as" and save the "SIRIUS projects". When my input file is named A.ms, the path of the binary file is "${directory_i_clicked}/A/fingerprints". This binary file takes disk space of 38KB for my example and is likely to have all the info I needed for fingerprints of A.ms. However, I could not parse it using python, as I can not guess the format and how it is encoded. Could you please give me some suggestions when you are available? Like what do the bytes encode? Thanks a lot for your reply. As these fingerprints are closely related to my bachelor graduation project(ddl is in May), this is essential and I can hardly wait till the back online of the SIRIUS server as the date is still unknown to me.