idslme / IDSL_MINT

A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data
16 stars 1 forks source link

Issue on using IDSL_MINT with cuda device #1

Closed sara-hashemi closed 2 months ago

sara-hashemi commented 3 months ago

I have downloaded and ran your code from GitHub in various environments (Colab, GC, AWS), however, when we change the yaml file to use the GPU server it throws an error specifying problems with the triton library. This is the error:

ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)

Please note, that I have made sure that we are using a GPU server (already checked and confirmed with using nvidia-smi command) and have modified the related YAML file that the device is “cuda”. I have also checked the training file to ensure everything is passed on to the specified device.

Do you have any insights on why this issue comes up? I have tested the code on the CPU and modified the YAML file to work with a CPU and that works fine. The issue only arises when using a GPU server and specifying the device as cuda.

barupal commented 3 months ago

Hi @NTuan-Nguyen, can you please help Sara to fix this issue ? Thanks! Dinesh

NTuan-Nguyen commented 3 months ago

Hello Sara,

I think this issue might sometime occur on a system with PyTorch and Triton backend on multiple GPU systems. I was able to replicate the issue on Colab using the default Colab PyTorch 2.3 with cuda 12. A workaround for this issue is to revert back to an earlier PyTorch build using cuda 11.8. This can be done by adding the following code to the notebook during installation step:

!pip install torch==2.0.1+cu118 torchvision torchaudio torchinfo --extra-index-url https://download.pytorch.org/whl/cu118

Example: image

I have tested this on Colab environment, but please let me know if the issue persist or if you're unable to apply the fix in your workspace.

sara-hashemi commented 3 months ago

Thanks, this worked in resolving the issue with the Triton library.

sara-hashemi commented 2 months ago

Hi NTuan,

Thank you for resolving the last issue. I was wondering if you can also assist me with the prediction part of IDSL_MINT, for MS2FP. I am providing the test samples in .msp format, containing 250 in one case and 500 in another case to the prediction method. However, it is only detecting 40-55 samples for different datasets. Could you kindly advise as to why not all samples were provided with a prediction? I have attached a screenshot of the prediction output in Jupyter. As can be seen, the blocks are read perfectly well, but when model prediction is initiated it only provides outputs to a portion of the samples.

I look forward to hearing from you.

Regards, Sara

One sample 

Another sample: 

On Jul 11, 2024, at 12:58 PM, NTuan-Nguyen @.***> wrote:

Closed #1 https://github.com/idslme/IDSL_MINT/issues/1 as completed.

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#event-13474750321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJCD6JNBN6BF4KJYV3DZL22UZAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGQ3TINZVGAZTEMI. You are receiving this because you authored the thread.

sajfb commented 2 months ago

The YAML file has a section for MSP processing criteria used to filter out MSP blocks that fall outside the model training space. If an MSP block does not meet these criteria, it will not be streamlined in the prediction step. You can find a log file in the output folder, which records any issues with MSP block processing. An example of an MSP block for Aspirin is provided on the main GitHub page. The necessary row entries for an MSP block are Name, PrecursorMZ, and Num Peaks.

sara-hashemi commented 2 months ago

Understood. Am I correct in assuming that in cases that the Names are not unique or we only have access to compound ID or InChIKey (such as the Casmi 2022 dataset), the algorithm would not be able to provide predictions?

On Jul 11, 2024, at 1:54 PM, Sadjad Fakouri Baygi @.***> wrote:

You should find a log file in the output folder. If the MSP blocks were not processed correctly, it will be recorded there. This is the primary reason why MSP blocks are not streamlined in the prediction step. I've put an example of MSP block for Aspirin in the main Github page. Necessary row entries for a MSP block are Name, PrecursorMZ and Num Peaks.

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#issuecomment-2223541177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJCIBXCFLAU6YLRODVDZL3BEZAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGU2DCMJXG4. You are receiving this because you authored the thread.

sajfb commented 2 months ago

Names do not have to be unique values, but Name row entries must be there. You should standardize your msp blocks before feeding them into MINT.

sara-hashemi commented 2 months ago

You mentioned the uniqueness of the name isn’t important. I checked the log and msp file.

This is just one of the warnings in the log file:

WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03!

We have provided the three fields you mentioned in all samples and this is the msp block related to the mentioned removed block:

Name: A_M8_negPFP_03 PrecursorMZ: 959.4857 accession:
formula: C46H74O18 inchi:
inchikey: ZKCHQVRAXCCTLE-YXHZOQBQSA-N instrument:
instrument_type:
ion_mode: Negative mspfilename: compound22_neg.msp origin:
precursor_type:
smiles: CC1(C2CCC3(C(C2(CCC1OC4C(C(C(CO4)OC5C(C(C(C(O5)CO)O)O)O)O)OC6C(C(C(C(O6)CO)O)O)O)C)CC=C7C3(CCC8(C7CC(CC8)(C)O)C(=O)O)C)C)C Num Peaks: 20 589.371337890625 1000.0 913.4783935546876 586.0267162402822 71.01252746582031 515.8543538237518 113.02287292480467 468.4553416691734 101.02297973632812 428.1174459461769 457.33154296875 397.6219270407272 85.02803039550781 337.9321735558277 89.02291107177734 274.9899145437956 275.0784912109375 250.24985495915308 161.04464721679688 217.49185721113724 304.286376953125 201.07134509328185 733.4205322265625 180.99497331719678 119.03327178955078 130.82086499261035 73.02812957763672 68.71968997340647 485.32757568359375 68.51101645221802 571.3665161132812 27.78861894280747 377.1240234375 22.856618471034004 199.70016479492188 19.588991572034402 221.0661163330078 15.165348488986465 365.5905151367187 15.03111104927375

As can be seen, the “Name”, “ PrecursorMZ” and “Num Peaks” are all provided. Based on your method I also normalized the peaks so their intensity would be between [10,1000]. Is there anything that we missed leading to the block being omitted from the prediction process?

On Jul 11, 2024, at 2:59 PM, Sadjad Fakouri Baygi @.***> wrote:

Names do not have to be unique values, but Name row entries must be there. You should standardize your msp blocks before feeding them into MINT.

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#issuecomment-2223676933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJGS3RCUJI2VO4PJQKLZL3IXVAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGY3TMOJTGM. You are receiving this because you authored the thread.

sajfb commented 2 months ago

WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated.

The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file?

sara-hashemi commented 2 months ago

Sure, please find it as below:

MINT_MS2FP_predictor:

You should try to use identical parameters used in the training step to maximize the performance of the model.

MSP:

Directory to MSP files: IDSL_MINT_files/msp_files/
MSP files: compound22_neg_sorted.msp # A string OR a list of msp files in [brackets]
Minimum m/z: 100
Maximum m/z: 900
Interval m/z: 0.1 # This parameters is also used as a maximum mass deviation parameter
Minimum number of peaks: 5
Maximum number of peaks: 512
Noise removal threshold: 0.01
Allowed spectral entropy: True
Number of CPU processing threads: 4

Model Parameters:

Model parameters must be identical to the used parameters in the training step; otherwise, PyTorch cannot load weight parameters.

Number of m/z tokens: 8003 # This parameter calculated using: 3 + (Maximum m/z - Minimum m/z)/Interval m/z
Dimension of model: 512 # general dimension of the model
Embedding norm of m/z tokens: 2
Dropout probability of embedded m/z: 0.1
Number of total fingerprint bits: 2051 # This number should also include three special tokens dedicated to this workflow. (e.g. 2048 + 3)
Maximum number of available fingerprint bits: 200
Number of attention heads: 2
Number of encoder layers: 3
Number of decoder layers: 3
Dropout probability of transformer: 0.1
Activation function: relu # relu OR glue

Model address to load weights: /home/ec2-user/SageMaker/IDSL_MINT_files/ms2fp_cmp_neg/MINT_MS2FP_model.pth

Prediction Parameters: Directory to store predictions: /home/ec2-user/SageMaker/IDSL_MINT_files/ms2fp_neg22_prediction Device: cuda # cuda OR cpu. When None, it automatically finds the processing device. Beam size: 3 Number of CPU processing threads: 4

On Jul 11, 2024, at 4:34 PM, Sadjad Fakouri Baygi @.***> wrote:

Those m/z thresholds pertain to the mass themselves not their intensities. Can you also post your YAML file?

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#issuecomment-2223876736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJFTIN47HZLOPGGUDVTZL3T45AVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA3TMNZTGY. You are receiving this because you authored the thread.

sara-hashemi commented 2 months ago

By this are you referring to the fact that the minimum and maximum m/z should be investigated over the mass of all samples (train/validation and test) then the yaml file can include those numbers? Could this be why the algorithm bypasses some samples? I will also look into the names for a more accurate representation.

On Jul 11, 2024, at 4:38 PM, Sadjad Fakouri Baygi @.***> wrote:

WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated.

The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file?

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#issuecomment-2223899839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJHQJCHGS2YYPNIE2STZL3UMFAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4TSOBTHE. You are receiving this because you authored the thread.

sajfb commented 2 months ago

Your precursor mass is out of the mass range specified in the YAML file.

Minimum m/z: 100
Maximum m/z: 900

There is a 10% tolerance in number of peaks for fragmentation mass to be outside of the training space, but the precursor mass must be within this range.

Additionally, keep in mind:

Minimum number of peaks: 5 counted after noise removal threshold: 0.01

sajfb commented 2 months ago

By this are you referring to the fact that the minimum and maximum m/z should be investigated over the mass of all samples (train/validation and test) then the yaml file can include those numbers? Could this be why the algorithm bypasses some samples? I will also look into the names for a more accurate representation. On Jul 11, 2024, at 4:38 PM, Sadjad Fakouri Baygi @.***> wrote: WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated. The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJHQJCHGS2YYPNIE2STZL3UMFAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4TSOBTHE. You are receiving this because you authored the thread.

Yes, each m/z value is represented by a specific embedded token. If that token is not in the training space, the model cannot represent your chemical space.

sara-hashemi commented 2 months ago

Thanks Sadjad for your help and advice. I believe I have three items on my plate for further investigation. I appreciate the explanation.

Kind regards, Sara

On Jul 11, 2024, at 8:46 PM, Sadjad Fakouri Baygi @.***> wrote:

Your precursor mass is out of the mass range specified in the YAML file.

Minimum m/z: 100 Maximum m/z: 900 There is a 10% tolerance in number of peaks for fragmentation mass to be outside of the training space, but the precursor mass must be within this range.

Additionally, keep in mind:

Minimum number of peaks: 5 counted after noise removal threshold: 0.01

— Reply to this email directly, view it on GitHub https://github.com/idslme/IDSL_MINT/issues/1#issuecomment-2224224326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJA4ZQ4WGGAFSBT66Y3ZL4ROPAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRUGIZDIMZSGY. You are receiving this because you authored the thread.

sajfb commented 2 months ago

you're welcome!