matchms / ms2deepscore

Deep learning similarity measure for comparing MS/MS spectra with respect to their chemical similarity
Apache License 2.0
48 stars 22 forks source link

After downloading the dataset, the code cannot run successfully. #153

Closed HideAimChinUp closed 8 months ago

HideAimChinUp commented 9 months ago

After downloading the MSP file on the website( https://mona.fiehnlab.ucdavis.edu/downloads ),the code cannot run successfully. Error reason display : AssertionError: Expected input argument 'references' to be list or tuple or np.ndarray. Then I have changed the paragraph in the code you provided: ”scores = calculate_scores(references, queries, similarity_measure)“ to: “scores = calculate_scores(list(references), list(queries), similarity_measure)”. As a result, the error became as follows : ①WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. ②ValueError: too many values to unpack (expected 2) I am very interested in this project and also eager to learn. Could you please help me solve this problem? Looking forward to your reply, thank you!

niekdejonge commented 9 months ago

@HideAimChinUp Sorry to hear it did not work right away for you. It seems like it has to do with some unexpected structure in the metadata. Could you share the entire script you did run and the complete error message? Matchms has a lot of functionality already to fix this kind of issues, so hopefully it is a quick fix.

HideAimChinUp commented 9 months ago

Okay!I have packaged the entire script I ran and attached it for your review,and I have included a screenshot of my complete error message below.

error message:

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月1日(星期三) 晚上8:47 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

@HideAimChinUp Sorry to hear it did not work right away for you. It seems like it has to do with some unexpected structure in the metadata. Could you share the entire script you did run and the complete error message? Matchms has a lot of functionality already to fix this kind of issues, so hopefully it is a quick fix.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

从QQ邮箱发来的超大附件

ms2deepscore-main.rar (85.52M, 2023年12月02日 17:34 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=296161321a9d97cd5e58591e453302181b155953505007564e5059005a1e020606054c530201041a0702580454035254525359506326305a10530557064343540c13041f0e5259594d130040630e&code=caa2c307

HideAimChinUp commented 9 months ago

Also, if it's convenient, could you please send me a copy of the dataset you used for the experiment at that time?

HideAimChinUp commented 8 months ago

I wonder if it might be possible for you to reply to my previous email? I am waiting for your opinion.Thank you very much and best regards!

niekdejonge commented 8 months ago

Thanks for sharing the code and Sorry for the late reply. I was away for a conference. I am a bit busy with catching up on everything, but will make time this afternoon to have a look at your issue.

HideAimChinUp commented 8 months ago

Thank you! I really appreciate it.

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月8日(星期三) 晚上6:54 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

Thanks for sharing the code and Sorry for the late reply. I was away for a conference. I am a bit busy with catching up on everything, but will make time this afternoon to have a look at your issue.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 8 months ago

I was able to download your file. It contains all the ms2deepscore files and your reference and query spectra, but I did not find the script you did run to create the MS2Deepscore predictions. Where you running a tutorial or a script of yourself?

HideAimChinUp commented 8 months ago

I have written down the complete running process for you to review, as shown below:

1、Download the complete code for MS2DeepScore

2、Configure Anaconda environment

pip install ms2deepscore

conda create --name ms2deepscore python=3.9

conda activate ms2deepscore

pip install ms2deepscore

3、Download Model MS2DeepScore_allGNPSpositive_10k_500_500_200.hdf5

4、Create Start File

Create “start.py” in “ms2deepscore-main.github\workflows”

The code is as follows: from matchms import calculate_scoresfrom matchms.importing import load_from_mspfrom ms2deepscore import MS2DeepScorefrom ms2deepscore.models import load_model # Import datareferences = load_from_msp("my_reference_spectra.msp")queries = load_from_msp("my_query_spectra.msp") # Load pretrained modelmodel = load_model("MS2DeepScore_allGNPSpositive_10k_500_500_200.hdf5") similarity_measure = MS2DeepScore(model)# Calculate scores and get matchms.Scores objectscores = calculate_scores(references, queries, similarity_measure) 5、Run “start.py”

Import data, load models, and calculate scores by running “start.py”, but encountered an error during operation.

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月8日(星期三) 晚上8:10 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

I was able to download your file. It contains all the ms2deepscore files and your reference and query spectra, but I did not find the script you did run to create the MS2Deepscore predictions. Where you running a tutorial or a script of yourself?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 8 months ago

Thanks I tried running your code. I got the error below. This actually happens at the loading of the data and is an issue at the end of matchms or of this specific file.

If you just run the loading of the spectra as in the code below, without any model prediction you get the same error.

from matchms.importing import load_from_msp
references = list(load_from_msp("my_reference_spectra.msp"))
Error:
line 52, in load_from_msp
    yield Spectrum(mz=mz,
  line 87, in __init__
    self._metadata.harmonize_values()
   line 109, in harmonize_values
    metadata_filtered = _add_retention(metadata_filtered, "retention_time", "retention_time")
 line 93, in _add_retention
    values = list(map(_safe_convert_to_float, values_for_keys))
  line 65, in _safe_convert_to_float
    return float(val) * conversion[unit]
KeyError: 'sec'

I am not yet sure if this is an issue at matchms or of this specific file msp file. I will have a look at this and create an issue at matchms. In the mean time you could also try ms2deepscore with a different file to make sure everything works well. Sorry for the inconvenience

niekdejonge commented 8 months ago

This issue was already reported by someone else in https://github.com/matchms/matchms/issues/551 I will close this issue for now, but we will continue to work on this in the matchms issue. We will let you know in the matchms issue once this works again.

HideAimChinUp commented 8 months ago

I added instructions as you said, but there was another error.It is shown in the following figure:

code:

error:

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月9日(星期四) 下午5:24 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

Oh and a quick fix to work with this input file would actually be to add metadata_harmonization=False load_from_msp("your_reference.msp", metadata_harmonization=False)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 8 months ago

@HideAimChinUp The figure did not load for me.

HideAimChinUp commented 8 months ago

I will write a text version:

code:

Import data

references = load_from_msp("D:\ms2deepscore-main\ms2deepscore-main\my_reference_spectra.msp",metadata_harmonization=False) queries = load_from_msp("D:\ms2deepscore-main\ms2deepscore-main\my_query_spectra.msp",metadata_harmonization=False)

Load pretrained model

model = load_model("MS2DeepScore_allGNPSpositive_10k_500_500_200.hdf5")

similarity_measure = MS2DeepScore(model)

Calculate scores and get matchms.Scores object

scores = calculate_scores(list(references), list(queries), similarity_measure)

scores = calculate_scores(references, queries, similarity_measure)

error: Traceback (most recent call last):   File "D:\ms2deepscore-main\ms2deepscore-main.github\workflows\start.py", line 15, in <module>     scores = calculate_scores(list(references), list(queries), similarity_measure)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\importing\load_from_msp.py", line 38, in load_from_msp     for spectrum in parse_msp_file(filename):   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\importing\load_from_msp.py", line 78, in parse_msp_file     parse_metadata(rline, params)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\importing\load_from_msp.py", line 157, in parse_metadata     value = match[1] IndexError: list index out of range

 

------------------ 原始邮件 ------------------

发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月9日(星期四) 晚上6:16 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

@HideAimChinUp The figure did not load for me.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 8 months ago

Thanks I just found this error as well, this is also an issue at the matchms side and was reported by anoter user as well https://github.com/matchms/matchms/issues/548 They are both recent issues with this data. I will implement a fix and do a new release of matchms. This should solve both issues and should make working with msp functional again.

HideAimChinUp commented 8 months ago

Okay, thanks for your efforts! I will continue to pay attention.

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月9日(星期四) 晚上7:05 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

Thanks I just found this error as well, this is also an issue at the matchms side and was reported by anoter user as well matchms/matchms#548 They are both recent issues with this data. I will implement a fix and do a new release of matchms. This should solve both issues and should make working with msp functional again.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 8 months ago

@HideAimChinUp It took a bit of time before the other developers had time to review the new code, but by now it is available in the new matchms 0.24.0. To make this work in ms2deepscore please update matchms to version 0.24.0. Let us know if you still have any issues.

HideAimChinUp commented 7 months ago

Sorry, I just noticed your unread email. What actions should I use to update matchms to version 0.24.0?

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年11月27日(星期一) 下午4:50 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

@HideAimChinUp It took a bit of time before the other developers had time to review the new code, but by now it is available in the new matchms 0.24.0. To make this work in ms2deepscore please update matchms to version 0.24.0. Let us know if you still have any issues.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 7 months ago

Just running

pip install matchms==0.24.0

should work. Or reinstalling ms2deepscore from scratch

HideAimChinUp commented 7 months ago

I re-downloaded the ms2deepscore package as you said and also re-installed the virtual environment. However, the code still reports the same error as before, with the following error message:

Traceback (most recent call last):   File "D:\研究课题\ms2deepscore-main\ms2deepscore-main\start.py", line 15, in <module>     scores = calculate_scores(references, queries, similarity_measure)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\calculate_scores.py", line 63, in calculate_scores     return Scores(references=references, queries=queries,   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\Scores.py", line 76, in init     Scores._validate_input_arguments(references, queries)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\Scores.py", line 122, in _validate_input_arguments     assert isinstance(references, (list, tuple, np.ndarray)),\ AssertionError: Expected input argument 'references' to be list or tuple or np.ndarray.

Then I changed the code "scores = calculate_scores(references, queries, similarity_measure)" to "scores = calculate_scores(list(references), list(queries), similarity_measure)", the error is as follows:

To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-12-05 18:19:30,214:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,217:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,225:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,226:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,228:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,231:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,234:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,260:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,263:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,273:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,278:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,280:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,283:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,287:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata. 2023-12-05 18:19:30,290:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata.

It just keeps going on and on.I then manually terminated the program with the following error:

Traceback (most recent call last):   File "D:\研究课题\ms2deepscore-main\ms2deepscore-main\start.py", line 16, in <module>     scores = calculate_scores(list(references), list(queries), similarity_measure)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\importing\load_from_msp.py", line 38, in load_from_msp     for spectrum in parse_msp_file(filename):   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\importing\load_from_msp.py", line 83, in parse_msp_file     masses = np.append(masses, mz)   File "<array_function__ internals>", line 200, in append   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\numpy\lib\function_base.py", line 5497, in append     values = ravel(values)   File "<array_function__ internals>", line 200, in ravel   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\numpy\core\fromnumeric.py", line 1885, in ravel     return asanyarray(a).ravel(order=order) KeyboardInterrupt

Sorry to bother you again, but the error doesn't seem to have been fixed yet.

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年12月4日(星期一) 晚上8:10 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

Just running pip install matchms==0.24.0
should work. Or reinstalling ms2deepscore from scratch

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 7 months ago

This is a different error message than you had before right?

Your solution seems good! (the issue was that a generator was created, converting it to list does the trick) The warnings mean it started to process the spectra, so I would just let it running a bit longer. Your spectra do not seem to have a precursor mz, which is a bit surprising but does not have to be a problem for running ms2deepscore.

HideAimChinUp commented 7 months ago

The program has run successfully now, but there are still the following errors. Could you please take a look at the reason for this?

Spectrum binning: 100%|██████████| 18915/18915 [00:02<00:00, 9175.20it/s] Create BinnedSpectrum instances: 100%|██████████| 18915/18915 [00:01<00:00, 14245.25it/s] Calculating vectors of reference spectrums: 100%|██████████| 18915/18915 [21:55<00:00, 14.37it/s] Spectrum binning: 100%|██████████| 18915/18915 [00:02<00:00, 7645.38it/s] Create BinnedSpectrum instances: 100%|██████████| 18915/18915 [00:01<00:00, 11256.14it/s] Calculating vectors of reference spectrums: 100%|██████████| 18915/18915 [25:35<00:00, 12.32it/s] Traceback (most recent call last):   File "D:\研究课题\ms2deepscore-main\ms2deepscore-main\start.py", line 16, in <module>     scores = calculate_scores(list(references), list(queries), similarity_measure)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\calculate_scores.py", line 63, in calculate_scores     return Scores(references=references, queries=queries,   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\matchms\Scores.py", line 181, in calculate     self._scores.add_dense_matrix(new_scores, name, join_type=join_type)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\sparsestack\StackedSparseArray.py", line 235, in add_dense_matrix     self._add_dense_matrix(matrix, name, join_type)   File "D:\Business Data Analysis\envs\ms2deepscore\lib\site-packages\sparsestack\StackedSparseArray.py", line 256, in _add_dense_matrix     (idx_row, idx_col) = np.where(matrix)   File "<__array_function__ internals>", line 200, in where numpy.core._exceptions._ArrayMemoryError: Unable to allocate 5.33 GiB for an array with shape (357776485, 2) and data type int64

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年12月5日(星期二) 晚上6:52 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

This is a different error message than you had before right?

Your solution seems good! (the issue was that a generator was created, converting it to list does the trick) The warnings mean it started to process the spectra, so I would just let it running a bit longer. Your spectra do not seem to have a precursor mz, which is a bit surprising but does not have to be a problem for running ms2deepscore.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

niekdejonge commented 7 months ago

You are running out of memory, since you calculate a matrix between 18915 spectra. This results in 18915^2 = 357776485 comparisons, which would use up more than 5 GB of Ram which your local computer probably does not have.

Do you need to know the all-vs-all comparison? You could first try with less spectra or consider using a server if you would really like to do something like this.

Alternatively you can do 1 spectrum against 18915 spectra, store this on disk and loop over all the 18915 spectra and in this way store the results iteratively, so it does not have to be in RAM at the same time.

HideAimChinUp commented 7 months ago

Everything is running normally. Thank you very much for your help during this period!

------------------ 原始邮件 ------------------ 发件人: "matchms/ms2deepscore" @.>; 发送时间: 2023年12月5日(星期二) 晚上8:30 @.>; @.**@.>; 主题: Re: [matchms/ms2deepscore] After downloading the dataset, the code cannot run successfully. (Issue #153)

You are running out of memory, since you calculate a matrix between 18915 spectra. This results in 18915^2 = 357776485 comparisons, which would use up more than 5 GB of Ram which your local computer probably does not have.

Do you need to know the all-vs-all comparison? You could first try with less spectra or consider using a server if you would really like to do something like this.

Alternatively you can do 1 spectrum against 18915 spectra, store this on disk and loop over all the 18915 spectra and in this way store the results iteratively, so it does not have to be in RAM at the same time.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>