Closed B3J4y closed 9 months ago
Hi Jan, thank you for your feedback.
If I understand well what you are trying to do, the problem arises because there is no example to run in my directory actually. You have indeed an example configuration.json file but it points to files in my system. Therefore, when you try to run it, the code finds no matrix in your system.
There are two ways for you to run the code through.
Either you can create an example for yourself, with an input matrix, and so on. I might also send some example files to you in case. Or you can use some Kaggle examples of MCDA. I played with them myself to test the code. You need to adjust the configuration file then.
Or you can run it through the pytests. In the directory tests/resources/ you can find some example input matrices (made for the tests).
Does it help you?
On the hard-coded paths in the configuration file, we had this discussion internally once and decided to leave the user full freedom in the configuration file to point to any paths she/he likes in her/his system, both for the input as well as for the output.
Wow, you have a very quick reaction time! 🙂
I believe it would be more beneficial to provide a running example that doesn't require users to configure the JSON file themselves. Although I've read in the README about the necessity of this configuration.
Regarding the point about hard-coded paths, I think there's been a misunderstanding. I'm not looking to debate your decision to use them as such. I just want to point out that replicating your local file system in the code isn't practical. It might be better to use relative paths, which could then be converted to absolute paths in the code. This approach is particularly useful when collaborating, as it allows for the use of a common config file. Otherwise, if everyone submits their own config file to the git repository, it could lead to issues where the program of other team members doesn't work anymore because their path is different.
OK, I'll work on these two points, stay tuned ;-)
@B3J4y please accept the invite to be a member of this organization so that you can review the PR above!
@B3J4y, I thought about the point of the absolute paths given in the configuration.json. I see your point but I am not yet sure that modifying the code makes it simpler. Let me explain my ideas to you.
I see two possibilities for giving input and output paths in the configuration file.
One possibility is the one implemented now: the user has full freedom of having input and output files anywhere on her/his machine, at the cost of giving the absolute path in the configuration file. The configuration file pushed into the repository main branch can be kept as the one used for the toy example to avoid confusion. Every user should modify the configuration for her/his needs in her/his local directory for the runs.
Another possibility is to use relative paths in the configuration file. However, this has a double cost: i) we need to assume in advance where the files should sit (e.g., in a subdirectory of the root directory containing ProMCDA; ii) the user has to copy and paste her/his input files in the assigned directory before running the code. Additionally, users should avoid to push data in the main branch.
I am still in favor of the first option of full freedom. But if I still did not convince you, a compromise can be: I let the code identify if the given path in the config is relative or absolute (with a checking function). If the path is relative and the files exist (i.e. are copied in the specific directory), then the code combines it internally with the absolute path and proceeds with the calculations. If instead, the path is not relative then an absolute path is expected in the configuration file. This is possible but a bit overengineered in my opinion. What do you think?
Thank you for your explanations. Your ideas have led me to another: if the attribute is not set, the system could search for the file in the current folder (or in a specified folder). This way, by default, you can configure your algorithm without needing to modify any configuration files, as long as this approach is well-documented.
I was thinking of a practical example where a similar problem is addressed. For instance, when you want to run a Dockerfile, the system looks for it in the current directory. However, you can also specify its location using a parameter. Adopting similar processes might be helpful. Would this approach be slightly less complex than option 2?
I am certainly biased because the solution I initially proposed - although not optimal in some respects - had underlying reasons. Among these was the possibility of always saving the configuration file as an output too, with the aim - during long experiments where perhaps one variable is changed at a time - of easily remembering exactly what was done.
If I removed the input file paths from the configuration file, I would be removing crucial information to reconstruct the experiment performed.
If you feel a change is necessary, I would opt for the function that filters between relative paths and absolute paths. However, I must reiterate that in my opinion the current solution is already acceptable, and as you can see from the toy example, if the files are in a subdirectory of the root directory, then the path in the config is practically relative. I can be more explicit in the README too.
Regarding the point about hard-coded paths, I think there's been a misunderstanding. I'm not looking to debate your decision to use them as such. I just want to point out that replicating your local file system in the code isn't practical. It might be better to use relative paths, which could then be converted to absolute paths in the code. This approach is particularly useful when collaborating, as it allows for the use of a common config file. Otherwise, if everyone submits their own config file to the git repository, it could lead to issues where the program of other team members doesn't work anymore because their path is different.
One way to encode this is to have a default location of the input data directory and to give the possibility to change that directory through an environment variable. That's how we do it in the biotrade (see __init__.py
) and eu_cbm_hat (__init__.py
) packages in my profile. In the case of biotrade, for example the data_dir
variable is a pathlib.Path
object which can be used to build paths to the data throughout the package.
For the particular case of the sample data, it can be stored within the package and distributed with it.
"I believe it would be more beneficial to provide a running example that doesn't require users to configure the JSON file themselves."
@Flaminietta wrote
" In the directory tests/resources/ you can find some example input matrices (made for the tests)."
It would be nice to move these to the example of the functions as well.
If I run the program as described in the README, using the example configurations.json, the following issues occur:
Traceback (most recent call last): File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\jbernoth\PycharmProjects\ProMCDA\mcda\mcda_run.py", line 379, in
main(input_config=input_config)
File "C:\Users\jbernoth\PycharmProjects\ProMCDA\mcda\mcda_run.py", line 98, in main
if input_matrix.duplicated().any():
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'duplicated'
My educated guess is that the hard-coded folders in the config.json file might be causing the error. Perhaps you could consider using relative paths instead. Additionally, if this issue is indeed related to the path configuration, the error message should provide hints about it. I attempted to insert an except FileNotFoundError: in utils.py:read_matrix, but it appears that the function where this exception is handled isn't being called before this error occurs.