matchms / ms2deepscore

Deep learning similarity measure for comparing MS/MS spectra with respect to their chemical similarity
Apache License 2.0
53 stars 24 forks source link

Train models with Metadata #124

Closed niekdejonge closed 1 year ago

niekdejonge commented 1 year ago

@djoas and @florian-huber

I had a look at making the metadata annotation a bit more robust. What I added was a MetadataFeatureGenerator class. This contains simple conversions of metadata to input feature, including normalization. This has as benefit that the conversion on the metadata will also be stored in the spectrum binner, so that when the user runs the model later there is no risk of normalizing in a different way for instance.

@djoas if you are busy with finishing up your thesis and don't have the time. No need to have a look, but since you worked on this it would be great if you notice anything that might give unexpected behavior.

niekdejonge commented 1 year ago

@florian-huber Could you have a look at the MetadataFeatureGenerator class I generated.

Do you think this is a good approach? I never used parent classes before, so wanted to make sure that this would make sense.

In this way we pass a list with Classes of type MetadataFeatureGenerator. In spectrum binning these classes can be used to generate the appropriate metadata based feature. Using the parent class has as benefit that the input stays the same and that they will all contain generate_feature method.

florian-huber commented 1 year ago

Nice! I will do a full code review tomorrow.

niekdejonge commented 1 year ago

Nice! I will do a full code review tomorrow.

@florian-huber Great! I noticed that the tensorflow version check was currently failing. Do you know what is causing that? It is unrelated to this pull request right?

florian-huber commented 1 year ago

@niekdejonge I made some suggestions for maybe changing the design, let me know if we should have a brief meeting to discuss.

The tensorflow thing should indeed be unrelated. Tensorflow is a constantly moving target.... better to look at that separately.

niekdejonge commented 1 year ago

@florian-huber I went through the pull request and removed all changes that were still in a draft or should only be stored locally. The changes remaining in this PR are:

florian-huber commented 1 year ago

Thanks @niekdejonge for the update. I went through the code as well and will add a few minor things with #141

florian-huber commented 1 year ago

I will merge this since tests are now passing (except for SonarCloud due to missing code coverage of the new workflows, this should be handled elsewhere --> #143 )

niekdejonge commented 1 year ago

Great! Thanks for fixing the failing tests!