mwang87 / MetabolomicsSpectrumResolver

Metabolomics Spectrum Resolver/Displayer
https://metabolomics-usi.ucsd.edu/
MIT License
9 stars 1 forks source link

Cosine in Dash interface and JSON don't always match #199

Open helenamrusso opened 2 months ago

helenamrusso commented 2 months ago

I would like to report what I believe to be a bug in the metabolomics spectrum resolver. I’m using it to retrieve the cosine similarity of a list of USIs, which overall has been working well and is providing me what I need. However, today I noticed cases in which the JSON file does not show the correct cosine similarity.

Example: Dash interface - cos 0.7962, and is indeed a good match: https://metabolomics-usi.gnps2.org/dashinterface/?usi1=mzspec:MSV000085142:vehicle_LI_C_Se[…]90&cosine=standard&fragment_mz_tolerance=0.1&grid=False

JSON - cos 0.01299: https://metabolomics-usi.gnps2.org/json/mirror/?usi1=mzspec:MSV000085142:vehicle_LI_C_Sept[…]nnotate_peaks=%5B%5B95.08549499511719%5D%2C%20%5B%5D%5D

I manually checked many, and overall these values match exactly. But with big lists, I’m wondering how many will be an example like this one.

mwang87 commented 2 months ago

Thanks! Will take a look and let you know

On Sat, May 11, 2024 at 10:30 AM helenamrusso @.***> wrote:

I would like to report what I believe to be a bug in the metabolomics spectrum resolver. I’m using it to retrieve the cosine similarity of a list of USIs, which overall has been working well and is providing me what I need. However, today I noticed cases in which the JSON file does not show the correct cosine similarity.

Example: Dash interface - cos 0.7962, and is indeed a good match: https://metabolomics-usi.gnps2.org/dashinterface/?usi1=mzspec:MSV000085142:vehicle_LI_C_Se[…]90&cosine=standard&fragment_mz_tolerance=0.1&grid=False https://metabolomics-usi.gnps2.org/dashinterface/?usi1=mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137&usi2=mzspec:GNPS:TASK-2f93c302650d4d928740b85da2aca965-spectra/specs_ms.mgf:scan:106&width=10.0&height=6.0&mz_min=None&mz_max=None&max_intensity=125&annotate_precision=4&annotation_rotation=90&cosine=standard&fragment_mz_tolerance=0.1&grid=False

JSON - cos 0.01299: https://metabolomics-usi.gnps2.org/json/mirror/?usi1=mzspec:MSV000085142:vehicle_LI_C_Sept[…]nnotate_peaks=%5B%5B95.08549499511719%5D%2C%20%5B%5D%5D https://metabolomics-usi.gnps2.org/json/mirror/?usi1=mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137&usi2=mzspec:GNPS:TASK-2f93c302650d4d928740b85da2aca965-spectra/specs_ms.mgf:scan:106&width=10.0&height=6.0&mz_min=None&mz_max=None&max_intensity=125&annotate_precision=4&annotation_rotation=90&cosine=standard&fragment_mz_tolerance=0.1&grid=True&annotate_peaks=%5B%5B95.08549499511719%5D%2C%20%5B%5D%5D

I manually checked many, and overall these values match exactly. But with big lists, I’m wondering how many will be an example like this one.

— Reply to this email directly, view it on GitHub https://github.com/mwang87/MetabolomicsSpectrumResolver/issues/199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXSEAC3NV7F74SX5HETJLZBZISNAVCNFSM6AAAAABHSCWUZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4TCMBQGA3DENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

helenamrusso commented 2 months ago

I did more investigation into this issue and I have some more information.

Please consider this USI as an example: mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137

in the web interface, the precmz is 188.1761 in the JSON file, the precmz is 709.1234

I checked this dataset in massive and filtered for the filename (https://massive.ucsd.edu/ProteoSAFe/dataset_files.jsp?task=a1375e1eca11456f9bed4b71c3f12f8d#%7B%22table_sort_history%22%3A%22main.collection_asc%22%2C%22main.file_descriptor_input%22%3A%22vehicle_LI_C_Sept_m2%22%7D), and there are two files with the same filename, but in different folders (one negative, and another one positive data).

I downloaded both files and inspected the 137 scan. in positive mode: m/z 188.1761 in negative mode: m/z 709.1235

therefore, in this case, dash interface is showing positive data, JSON is showing negative data.

PS: as a background... I got this USI (mzspec:MSV000085142:vehicle_LI_C_Sept_m2:scan:137) out of fastMASST searches, and the fastMASST result is pointing to this USI as 188 precmz.

bittremieux commented 2 months ago

Thanks for the detailed investigation. This is an interesting edge case. The USI standard details how to distinguish multiple runs with the file name in a single dataset, using the subfolder mechanism in section 3.6.1.

So in this case, the unique USIs would be:

However, our resolver doesn't seem to support this format, nor does the general MassIVE resolver. It does seem to return all matching files though.

So it seems like the solution must be two-fold:

  1. Proper resolving of USIs containing subfolders through MassIVE and our resolver.
  2. Proper reporting of unique USIs from MASST.

And maybe:

  1. Give an error message if a non-unique USI is provided in the resolver?