matchms / matchms-backup

Python library for fuzzy comparison of mass spectrum data and other Python objects
Apache License 2.0
0 stars 1 forks source link

Define an API #7

Closed fdiblen closed 4 years ago

fdiblen commented 4 years ago

We need to identify project specific parts and generic parts of the software.

Specific here means either for mass spectra or genes. It should be entered into a library.

jspaaks commented 4 years ago

API what does a user want to be able to do vs. what can we already do?

HannoSpreeuw commented 4 years ago

I would go for a slightly different description of the general purpose of the library: it is not about discovering data, but about classifying microbes, by comparing their mass spectra with mass spectra of known microbes. This is how you can identify a microbe.

There is no such thing as generated data from a single microbe in Spec2Vec. The input for Spec2Vevc always consists of mass spectra from a known and an unknown microbe.

@florian-huber please correct me if I am wrong.

fdiblen commented 4 years ago

@HannoSpreeuw Let's focus on the API design in this issue and keep the description discussion for team meeting.

jspaaks commented 4 years ago

I updated my summary above so it doesn't talk about the general purpose of the library anymore, so we can focus on the actual API

HannoSpreeuw commented 4 years ago

Essential for the API is to know if the input consists of one or two datasets (sequences). I will try to figure that out.

fdiblen commented 4 years ago

There is a flowchart made by @florian-huber code_flowchart.pdf

CunliangGeng commented 4 years ago

calculate let similarity = (spectrum, reference_spectrum) => { }

  • classical way 1: structural similarity
  • classical way 2: Tanimoto (this the default type for structural similarity)
  • classical way 3: cosine
  • classical way 4: modified cosine
  • new way: based on word2vec-like analysis (~separation distance in N-space?~ cosine between spectrum vectors)

These functions are in two scripts MS_similarity_classical.py and similarity_measure.py.

The MS_similarity_classical.py is used for comparison with new similarity measures. Is this script useful for other users? Should we keep it? Should we combine it with similarity_measure.py?

fdiblen commented 4 years ago

If they are identical, it makes sense to remove one of them and do import from the other.

CunliangGeng commented 4 years ago

If they are identical, it makes sense to remove one of them and do import from the other.

no, they are totally different. MS_similarity_classical.py seems MS-specific and implements the classical way 1-4; similarity_measure.py should be generic and implements the new way.

jspaaks commented 4 years ago

With the new layout (https://github.com/matchms/matchms/pull/47), I think this is clear enough for the moment. Naturally, the API will see some smaller changes in th enear future but that's no reason to keep this issue open forever.

Refer to my comments

This issue should be closed.

HannoSpreeuw commented 4 years ago

Don't you think we still need a document defining the API?

florian-huber commented 4 years ago

@HannoSpreeuw I added some comments on @jspaaks suggestions for a new API to #41

jspaaks commented 4 years ago

Re: https://github.com/matchms/matchms/issues/7#issuecomment-606640308 if we can think of a use case in which such a document would be the solution, yes, otherwise no.

Maybe we could simply update or expand __init__.py?

fdiblen commented 4 years ago

We discussed this during the meeting with Stefan and Hanno and decided to close it. The information in this issue was split in to several issues.