pluskal-lab / DreaMS

DreaMS (Deep Representations Empowering the Annotation of Mass Spectra)
https://dreams-docs.readthedocs.io
MIT License
22 stars 6 forks source link

Masked NIST + MoNA entries #9

Closed a-turcu closed 1 week ago

a-turcu commented 1 week ago

Context

DreaMSAtlas object stores a mix of NIST + MoNA datasets in atlas.lib.

Problem

62730 out of 79300 entries in atlas.lib are masked with the value -1. I assume this is intentional, considering the existence of the function atlas.get_lib_idx() (although it is not used anywhere).

Question

What is the reason for masking them? What role does these entries play in the Atlas? Are they even included in the atlas.csrknn?

roman-bushuiev commented 1 week ago

Hi @a-turcu! Thank you for the question.

What is the reason for masking them?

Most of the annotated nodes are masked intentionally, as they correspond to NIST20 entries. Unfortunately, we cannot publicly provide these entries due to NIST20 licensing restrictions.

What role does these entries play in the Atlas?

Although these nodes are masked in the public version of the DreaMS Atlas, they were included in all the analyses presented in our paper.

Are they even included in the atlas.csrknn?

No, they are masked in all data structures of the DreaMS Atlas to avoid any conflicts with the NIST20 licensing conditions.

a-turcu commented 1 week ago

Thank you, this answers my questions.