Closed HeinrichWizardKreuser closed 2 years ago
Hi @HeinrichWizardKreuser ,
Thank you so much for your awesome words. I love your idea but I really don't have enough time to do it. It is a huge works you should know.
Hi @thieu1995
I'm glad to hear that you like the idea. I am willing to do this myself in a pull request. I would just need you to supervise it and give feedback on it.
Would you please take the time to comment on each of these and give feedback, after which I will complete the database based on the set of functions that are already available in the opfunu library.
I think the current shape (a list of dictionaries) is a good start. It can easily be transformed into a pandas dataframe.
I have a few options for this one
db
. Then a file containing all of the data.dimension_based
, cec/*/
etc.) Then one file that would call each of the module's database files and put them into one datastructure.dimension_based
, cec/*/
etc.) would have a dedicated db.py
file containing a database on the functions in that module. db
that would have methods to combine all of the different db.py
contents. The module could offer helper methods such as getting all methods that fall under a certain filter (.e.g. get all functions with dimensionality of 1 or that fall under the 'convex'
category). I do think that the helper methods would be very specific as more broad helper methods would just result in being direct calls to built-in pandas methodsAre the fields I've added correct / useful? I added these fields based on what I found useful, but also based on what I could consistently add. Here I have listed them for you along with how I believe they ought to be represented (they are in alphabetical order)
dimensions
1
, 2
etc.'*'
, 'd'
, or None
. any ideas?domain
x
.[-1, 1]
latex
links
method
minima
dict(fx=0.55, x=[1])
would mean that a minima (f(x)
) of 0.55
is achieved when plugging in 1 (as x
) into the function.d
dimensions and the minima changes based on the dimensiondict(fx=0.55, x=[-1.51, -0.75])
would represent a minima of 0.55
when plugging in x of either -1.51
or -0.75
name
references
tags
[ 'continuous', 'differentiable', 'separable', 'scalable', 'multi-modal' ]
Although I can manually add these fields, or even write a webscraper to get many of these fields - both of those methods may result in errors. I don't know how to assert that they are correct (other than double checking), but I guess that's open-source for you. People can point out the mistakes and fix them.
@HeinrichWizardKreuser ,
Oh after I re-read all of what you have written here. I remember one I've had thought about writing documents for opfunu, with all of the properties as you listed above. I tried with readthedocs, because it will reduce the time to write the documents. But I failed like 2 or 3 times. You can actually see its document here: https://opfunu.readthedocs.io/en/latest/pages/cec_basic/cec_basic.html#opfunu-cec-basic-cec2014-module
I did use comment format as required but it was't showing the document on the website, so I gave up. But now I know what caused the problem because I've successfuly built another library with a completed documents https://mealpy.readthedocs.io/en/latest/pages/models/mealpy.bio_based.html#module-mealpy.bio_based.EOA
You can use search to find anything you want in there, any algorithm or tag or properties. So what do you think? Instead of writing such a field to opfunu, we can just re-update the comment and fix the bug with readthedocs. I still don't know what can we do with pandas functionalities when adding a module db.py to each module (type_based, multi_model,...)? Because if users want to know about function characteriestics, they can search on the doc's website.
However, opfunu is still missing an important functionality which is drawing. I also tried it so long ago but was not successful with 3D plotting. Now I found a really good repository where the author implements the drawing functions and codes in a very clear way. (you can see it here: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py) Yes, he uses some properties to draw each function. Maybe we can keep some properties as a dictionary python for each function (class) to draw the figure, and other properties such as latex, references, and link, we can put them in the documents section instead. What do you think?
And another question, I would like to ask your suggestion. Currently, there are 2 types of programming in opfunu (functional and OOP-class). It messed up the repository for new users. I'm thinking about removing the functional style and keeping the OOP style because it will reduce the coding time. What is your suggestion for this matter?
Greetings @thieu1995 . I appreciate this conversation and hope that it will benefit the repository.
I agree with your point on adding the details to the documentation, but I believe having it in physical code is also important. What it boils down to is being able to programmatically filter benchmark functions based on attributes, running simulations of each benchmark (with its own meta-parameters) and exporting results - all in one pipeline. Having all of the details - including the physical implementation - of each method will allow the users to programmatically run experiments and draw conclusions (something that I wish I had when working on my projects and writing papers).
But I think we should have both the database and the docs. Even better, we can have the database and the docs can be populated from it (thus we'd only need to update the database and the docs would automatically be updated).
I was actually thinking of adding 3D plotting to opfunu next. I made lots of plots in my previous projects for my course on Computational Intelligence:
It was some simple matplotlib code, but it is rather specific so having it be built-in for the user's convenience would be good.
As for the library https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py, I agree, it does seem very useful. We can even incorporate the code for 3D plotting into opfunu (or invite them to add it).
I also think that the details that @AxelThevenot added to each method will be instrumental in speeding up adding new fields to the database and asserting its correctness.
I would love to contribute my suggestion, but unfortunately, I don't have enough information. Could you perhaps post examples comparing the two?
@HeinrichWizardKreuser
I get it now and agree with you. I'm not sure how to do it with one pipeline, but I think with your imagination we can do it. If we can build a database, I think we pull it out for the docs also.
I think I can spend time to re-structure opfunu as that guy did in his repository. So any function can pull out its 2D or 3D figures.
You can see it in the readme.md file, I give an example of how to call function or class in opfunu. For example, the CEC-2014 module
import numpy as np
from opfunu.cec.cec2014.function import F1, F2, ...
problem_size = 10
solution = np.random.uniform(0, 1, problem_size)
print(F1(solution)) # Function style
from opfunu.cec.cec2014.unconstraint import Model as MD
func = MD(problem_size) # Object style solve different problems with different functions
print(func.F1(solution))
print(func.F2(solution))
Anyway, it is just a way to structure the code and the way to call the function out. You can call it from a module or call it from class. But now I think each benchmark function should be a class. And it should inherit from a BaseClass that defines anything in common there.
I think we can start with option 3. Option 4 sounds the better one, but it may be hard to combine all of them in one DB since each module with each function has its characteristics and properties.
The fields to include and how to represent each of them
dimensions: For N dimensions, I think we can put it None there.
minima: I think you can just try it with your current idea.
other fields: I agree with them all
@thieu1995
I'm glad we can agree on this. I'll implement the database first and then we can look at automatically generating the docs from there.
I agree with using his 3d plotting, but I am sceptical about restructuring the package to use OOP (see next point)
The implementation of my database approach essentially creates a dictionary for each benchmark function where the dictionary contains "metadata" of the benchmark along with the actual benchmark python implementation which can simply be called using the __call__
attribute (commonly known as "calling the method" - using f()
where f
is the method to call).
If you wish to take the OOP approach, then my database implementation will introduce redundancy since the values in each dictionary (such as the latex formula and attributes such as convex
etc) will likely also be in the OOP implementation.
For instance, where the OOP implementation would be
class Adjiman:
name = 'Adjiman'
latex_formula = r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}'
the database implementation would be
data = [
dict(
name='Adjiman',
latex_formula=r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}',
),
...
]
Thus, both would have the fields name
and latex_formula
in this example, which is redundant. Ideally, you'd want one to simply inherit/collect the information from the other.
If you wish to use OOP for each benchmark, we can convert the database to loading the classes in memory and calling the metadata that python reveals for us such as __dict__
. For instance:
>>> # retreiving from the class itself
... a.__dict__
mappingproxy({'__module__': '__main__',
'name': 'Adjiman',
'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
'__dict__': <attribute '__dict__' of 'Adjiman' objects>,
'__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
'__doc__': None})
>>> # retrieving from an instance
... a = Adjiman()
... a.__class__.__dict
mappingproxy({'__module__': '__main__',
'name': 'Adjiman',
'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
'__dict__': <attribute '__dict__' of 'Adjiman' objects>,
'__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
'__doc__': None})
of course, this approach is ugly as it also includes things such as __doc__
and __weakref__
. We would just need to build methods that call the __dict__
attributes of each class and "clean" them (remove things like __doc__
) and then build the database structure I originally designed.
In this approach of using OOP, it would be desirable for the classes to inherit from a base class (e.g. BaseBenchmark
) since we can then call BaseBenchmark.__subclasses__()
to retrieve each class that is currently loaded into memory (when you import a module/package) in a list. This list can then be iterated over and calling __dict__
on each subclass would be used to populate the database.
To conclude, I am open to the idea of using OOP. Personally, it doesn't matter to me. I do think OOP does give us more control over customizing a benchmark function, so perhaps we should go for it. We'd have to take a continuous development approach:
Functions._brown__()
for instance. Perhaps we should keep these methods, but alter them to call our new implementation and simply raise a deprecation warning.latex
and convex
etc. Here we can use https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py to already fill in many of the blanks.@HeinrichWizardKreuser ,
Ah, I see. Then I think we should keep it as functional style. And now because we are using database, I think we don't need to split benchmark functions as multiple module as I did (type_based, or dimension_based). I think we can group them into 1 single module. Because lots of methods were re-implemented in both categories. What do you think?
@thieu1995 , I updated my message and I believe I finished the edit after you already started formulating a response. I just want to confirm, is your message made with respect with the latest version of my message that contains the "Continuous Development plan if we take the OOP approach" section?
To be clear, I don't know whether functional approach or OOP is better. The easiest would be to just stick to opfunu's current functional approach and then add the database. The question is whether OOP offers a benefit. Does OOP give us some desired control over benchmark methods? Such as say parameterized versions of benchmarks?
@HeinrichWizardKreuser
Lol, I only read the above part that you wrote.
So that is the point of my suggestion, because the repo above, he already implemented as OOP style and can search the functions with some properties as you wish with the database. For example this code from his repo:
import pybenchfunction as bench
# get all the available functions accepting ANY dimension
any_dim_functions = bench.get_functions(None)
# get all the available continuous and non-convex functions accepting 2D
continous_nonconvex_2d_functions = bench.get_functions(
2, # dimension
continuous=True,
convex=False,
separable=None,
differentiable=None,
mutimodal=None,
randomized_term=None
)
print(len(any_dim_functions)) # --> 40
print(len(continous_nonconvex_2d_functions)) # --> 41
Right now, I only consider the non-parameterized benchmark functions. But we may think to create a new module for parameterized functions in the future.
Yes, we should stick to the current functional style. But I still want re-name the function as a public function. For example: Instead of of calling "Functions._brown__()", they can call. "Functions.brown()". I have coded this style a long time ago (it is stupid when implemented as a private function) when I was a student. I haven't thought about changing it until now.
Also, can you try the database with type_based and dimension_based modules first? Leave the cec for later. I want to see how the database works with them first before moving to cec functions.
I agree. Let's only consider parameterized benchmarks for a future OOP overhaul.
I'll leave the name change to you for later.
Right now I will write my suggested database approach with references to benchmarks in type_based
and dimension_based
first and then we can take it from there. Will post my progress here.
@HeinrichWizardKreuser ,
What about my above question? Do you suggest grouping type_based and dimension_based into 1 module? Because like I said, there are several functions have been duplicated in both modules.
And please create a new branch when you want to push something new. The branch name should be "dev/feature_name" or "dev/your_name", I don't mind.
@thieu1995, sorry for missing your question.
I think we can combine them, yes. Should I do it the PR or leave it for later? I was thinking for later.
Understood, will name the branch accordingly.
@HeinrichWizardKreuser
I guess it depends on you. Do you want to create the database first and then group them or do you want to group them into a single module first and then design the database?
Besides, to not waste your time, you should try to create the database for some functions only and then test the pipeline or whatever you want first. If it works as you expected then you can apply for the rest of the functions.
@thieu1995
We want to group the functions in any case, so let's group them in a separate PR (or you can do it yourself). If we group them in this PR and end up scrapping this PR, then we have to group them together again or do some commit picking black magic to extract the grouping part of the PR.
I agree. I will make the database for the two files we discussed and then make some notebooks showing off use cases to ensure that they work in the way I believe the user would desire.
@HeinrichWizardKreuser ,
Yeah, then let's leave it for later and for another PR.
@thieu1995
I am populating the fields of each benchmark using the following criteria (in order)
If I cannot find a tag such as convex
for a benchmark, does that automatically mean that it is non-convex
? For example, the benchmark "Egg Holder" does not have a tag denoting its convexness.
Looking at Thevenot's implementation of EggHolder, I see that it has convex set to False: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/91c37d9d0f1f3366064004fdb3dd23e5c2681712/pybenchfunction/function.py#L981.
For now, I will assume the answer to this question is "yes".
@HeinrichWizardKreuser
Yes, if not convex, you can tag it non-convex. We can change it later if it is convex. Just do what you think is good.
Hello 👋
I saw you were speaking about refactoring your project like mine in some ways
About the question on if EggHolder is convex or not, It is possible I made a mistake
It was a hard work so maybe there are more than one mistake so do not take my parameters as they were perfect :)
Hi @AxelThevenot ,
Yes, we spoke about refactoring the project. But now we decide to keep it with this current style. And test the new features first.
@thieu1995
I've realised how much manual labour this is and have written a web scraper to get the data from
I've successfully crawled the data from those two websites. My next goal is to parse the data from the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns.
After that, I will combine the cleanest combination of the data and then test the database in some notebooks where I run experiments to show how the database can be used.
If some of the data disagree with each other, I will flag it here where you can advise. (.e.g. one claims that a method is convex while the other says it is non-convex).
@AxelThevenot ,
Yeah, I appreciate the heads-up!
Been doing other work the last week, but yesterday I finished crawling the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns.
I'm currently matching all the functions across the different sources. So the next step is to combine to find how I can best combine them (for instance, how do I decide which source's input domain to keep?). I'm making good progress.
@thieu1995
Here is a preview of the data I've collected so far. https://github.com/HeinrichWizardKreuser/mealpy-database/blob/master/nb/data.json
There is also a jupyter notebook in the same directory showcasing how I collected the data. Each item in the list is a dictionary where the keys are b
for benchmarkfcns
, s
for sfu
and i
for infinity77
. These keys represent where they were gotten from. The values for these keys are then the scraped data such as the latex, name etc.
Some dictionaries don't contain data for things like sfu
, but have data from infinity77
and benchmarkfcns
and vice versa.
Have a look when you get the chance. TIP: you might want to download the file and then open it in your browser if you want to easily view the json file (collapse and expand some parts etc)
These are just data that overlap with each other from different sources, I still need to add the data that is from individual sources and then still map the data to benchmark functions that you've implemented. Then I need to find a way to concisely list them as a database.
@HeinrichWizardKreuser ,
That is a really great job. But I think, you should test with a few functions first, then build a database, functionalities, and pipeline that you want. Don't spend too much time correcting each function's properties right now.
When your database and functionalities that your design work as you expected, then we can come back and finish all other functions.
@thieu1995
You are 100% correct. I'm currently switching gears and building notebooks to show off what we can do. Here's the main things I want to showcase using the database:
Will keep you posted.
Hi @HeinrichWizardKreuser ,
Any new news on your progress?
Hi @thieu1995
I haven't made any updates since my last comment - been busy with work and other hobby projects. But I've hit an obstacle with some of it, so I think it would be good to take my mind off of it and continue with my work here
@HeinrichWizardKreuser,
Thanks for letting me know.
I really like this repository and used it in my Computational Intelligence course back in university. However, I wish that it also included a database for each benchmark method.
This would be really useful if someone would like to use and compare benchmark methods, but want to know how to draw meaningful conclusions from their tests. For instance, knowing what tags a benchmark method has (e.g. continuous vs discontinuous, non-convex vs convex etc) and knowing the dimension of the benchmark on hand would speed up the process of say concluding whether an algorithm performs better in lower dimensions, on convex landscapes etc.
Would you be interested if I attempted to add such a thing? This can either be by csv or by physically adding a list of dictionaries. Here is a preview of what I suggest:
api.py:
whereafter one would use the
data
list of dictionaries to build a dataframe for instance, using