Closed florian-huber closed 1 year ago
I would make the suggestion to use a dictionary like so:
additional_input = [{"feature_name": "precursor_mz", "scaling": 0.001}]
spectrum_binner = SpectrumBinner(bins, mz_min=10, mz_max=1000, peak_scaling=0.5, additional_metadata=additional_input)
With the SpectrumBinner beeing part of a saved Model, we can save and load it within the SpectrumBinner and it also automatically applies the scaling when predicting.
Additionally the DataGenerator should be adjusted so:
data_generator = DataGeneratorAllInchikeys(binned_spectrums, ..., additional_input=additional_input)
is still possible.
This looks like it would work.
It would use a list of dictionaries, which would do the job. But it is not very common, probably because there is a lot of redundancy due to the repetition of the dictionary keys:
additional_input = [{"feature_name": "precursor_mz", "scaling": 0.001}, {"feature_name": "retention_time", "scaling": 0.01}]
An alternative could be to store it as
additional_input = {"precursor_mz": 0.001, "retention_time": 0.01}
That is much more compact, but of course, requires that people get to know that it expects field-name/scaling pairs (fine with me).
I have pushed an update to the additional_input_parameters branch according to your approach. It seems that matchms had a change in the BaseSimilarity.matrix parameters, so pylint is throwing an error.
This has been included in #124
Discussed with @djoas:
Solution: Include
additional_inputs
(oradditional_features
) as a model parameter that contains both the metadata field names and a scaling factor. Could be a list of tuples or a dictionary.