Closed fdiblen closed 4 years ago
API what does a user want to be able to do vs. what can we already do?
undefined
values: "N/A"
, "n/a"
, ""
, null
, etcpeak_103.44
let similarity = (spectrum, reference_spectrum) => { }
I would go for a slightly different description of the general purpose of the library: it is not about discovering data, but about classifying microbes, by comparing their mass spectra with mass spectra of known microbes. This is how you can identify a microbe.
There is no such thing as generated data from a single microbe in Spec2Vec. The input for Spec2Vevc always consists of mass spectra from a known and an unknown microbe.
@florian-huber please correct me if I am wrong.
@HannoSpreeuw Let's focus on the API design in this issue and keep the description discussion for team meeting.
I updated my summary above so it doesn't talk about the general purpose of the library anymore, so we can focus on the actual API
Essential for the API is to know if the input consists of one or two datasets (sequences). I will try to figure that out.
There is a flowchart made by @florian-huber code_flowchart.pdf
calculate
let similarity = (spectrum, reference_spectrum) => { }
- classical way 1: structural similarity
- classical way 2: Tanimoto (this the default type for structural similarity)
- classical way 3: cosine
- classical way 4: modified cosine
- new way: based on word2vec-like analysis (~separation distance in N-space?~ cosine between spectrum vectors)
These functions are in two scripts MS_similarity_classical.py
and similarity_measure.py
.
The MS_similarity_classical.py
is used for comparison with new similarity measures.
Is this script useful for other users? Should we keep it? Should we combine it with similarity_measure.py
?
If they are identical, it makes sense to remove one of them and do import from the other.
If they are identical, it makes sense to remove one of them and do import from the other.
no, they are totally different.
MS_similarity_classical.py
seems MS-specific and implements the classical way 1-4
; similarity_measure.py
should be generic and implements the new way
.
With the new layout (https://github.com/matchms/matchms/pull/47), I think this is clear enough for the moment. Naturally, the API will see some smaller changes in th enear future but that's no reason to keep this issue open forever.
Refer to my comments
This issue should be closed.
Don't you think we still need a document defining the API?
@HannoSpreeuw I added some comments on @jspaaks suggestions for a new API to #41
Re: https://github.com/matchms/matchms/issues/7#issuecomment-606640308 if we can think of a use case in which such a document would be the solution, yes, otherwise no.
Maybe we could simply update or expand __init__.py
?
We discussed this during the meeting with Stefan and Hanno and decided to close it. The information in this issue was split in to several issues.
We need to identify project specific parts and generic parts of the software.
Specific here means either for mass spectra or genes. It should be entered into a library.