jeancroy / RP-fit

MIT License
11 stars 1 forks source link

Codebase refactoring + API modification #2

Closed RaenonX closed 10 months ago

RaenonX commented 10 months ago

Closes #1

Core Python scripts refactor

For every implementation in the core Python implementation rp_model.*, imports have to be relative, i.e. from ..utils import table or something similar. This seems to be the only way to allow the "main script" to run at different directory level.

For every usages in the notebooks, use rp_model.* should be fine. No sys.path modification will be needed in this case/.

Notebook files refactor

Every notebook has been moved to notebooks/ folder, except the 3 very original ones: RP *.ipynb. I am not sure if we are keeping it or moving it, so I leave it there for now.

FitOptions instantiation removal

Singleton instantiation of rp_model.calc.FitOptions has been removed as it doesn't seem to be needed. Simply modify the class attribute/field like FitOptions.rp_file_id = "AnotherFileId does the same thing.

Updated file directory handling

An optional environment variables of RP_MODEL_FILE_PATH is available for overriding the default file directory. If the aforementioned is not set, ./files will be the default.

The value of RP_MODEL_FILE_PATH can be relative directory. It will convert to absolute path automatically. The path set for file directory will be printed out.

Coding style improvements

Removed a few redundant parentheses and changed some indentation in a few places, so it looks a bit nicer and my IDE will stop complaining 😉

Also made a few single quote to double quote to make the strings across the codebase more consistent (I didn't check through the whole codebase on this).

RaenonX commented 10 months ago

Side note, I am not sure if the current code in api.py uses DFO as you mentioned, but whatever in dfo-ls.ipynb doesn't seem to be converging.

jeancroy commented 10 months ago

First comment, Pycharm went all yellow and I think I have some of your local settings still in the project.

Screenshot 2023-11-14 122059 Screenshot 2023-11-14 122211

Also I think some of your linter settings, especially space around operator, are custom or at least different from PyCharm default. Because my PyCharm was removing them. I don't mind using them, just need to share them I guess.

jeancroy commented 10 months ago

I think the dev branch is missing at least one commit from your local.

Screenshot 2023-11-14 123306

jeancroy commented 10 months ago

I don't super like the idea of /rp_model/utils/ because most of the utils got nothing to do with rp model. It may be a necessary evil for import to work tho.

RaenonX commented 10 months ago

First comment, Pycharm went all yellow and I think I have some of your local settings still in the project.

Also I think some of your linter settings, especially space around operator, are custom or at least different from PyCharm default. Because my PyCharm was removing them. I don't mind using them, just need to share them I guess.

Not sure about this tbh, I don't see .idea folder in https://github.com/jeancroy/RP-fit/tree/dev, maybe that __init__.py at the root level did the thing? I am always dark mode so I can't tell if the color coding meaning is the same, but yellow means excluded? Eyeballed but doesn't seem to have any project-specific settings.

That requirements.txt was intentional to keep the environment for both of us having the same packages installed.

RaenonX commented 10 months ago

And you were right that I did customize my settings, forgot about it. Image below to show how to import in case needed.

image

Default.zip

RaenonX commented 10 months ago

I think the dev branch is missing at least one commit from your local.

For this I actually removed that function because it has to called pretty early, the import order will be messy. See the Updated file directory handling section in the very first comment. Since the current repo uses ./files almost everywhere, I think it's likely just gonna be me setting RP_MODEL_FILE_PATHenv var to override.

RaenonX commented 10 months ago

I don't super like the idea of /rp_model/utils/ because most of the utils got nothing to do with rp model. It may be a necessary evil for import to work tho.

The rp_model part is more like a thing to tell the repo that "make everything into a module." It could be fine pulling out of the rp_model module and be itself, but then the relative imports inside rp_model or the usages in the Python scripts will likely need to be something from ..utils import whatever, and from utils import whatever in the notebooks. See if you like how it imports stuff. Personally I am not used to having an import path that passes through the root directory.

jeancroy commented 10 months ago

Yellow is OK I just needed to exit and reload the projet and let pycharm redo the setup.

If it's best practice to no use project root for import I'm Ok trying that.

The set_files_directory I mostly just remove them ?

I'll check what you mean by not converge.

RaenonX commented 10 months ago

(Others)

OK / Copy that / Yes sir / whatever positive and acknowledge.

The set_files_directory I mostly just remove them?

Yeah, I just tried removing them from my local and it works fine.

jeancroy commented 10 months ago

Success: rho has reached rhoend That's a success for dfo-ls.

Yes it takes like 1528 iteration instead of like 12. But also the normal least square kinda cheat in how it counts. Like it'll evaluate the model N times in a single iteration where N is the number of variables (like 200 now), dfo-ls count those as differents.

RaenonX commented 10 months ago

Success: rho has reached rhoend That's a success for dfo-ls.

Yes it takes like 1528 iteration instead of like 12. But also the normal least square kinda cheat in how it counts. Like it'll evaluate the model N times in a single iteration where N is the number of variables (like 200 now), dfo-ls count those as differents.

Oh that's good to know. If the data was wrong could it not reach the end? Because I ran the model last night when there was no data about Onix, it showed me something like FALSE_SUCCESS or else instead (I didn't commit it to the notebook) and it didn't look like a table.

The second column of the output was stuck at E+06 and at a certain iteration it jumped to E-03 and kept running. Delta wasn't changed at all so I thought it wasn't converging.

jeancroy commented 10 months ago

Ok so instead of delete set_files_directory I tried to make it work. You're right that the timing is delicate. But even with os.setenv timing is delicate. And if I can avoid mixing operating system implementation and math... I guess it's worth something.

Can you check if the current implementation also match your need ?

If so, I'll probably rename the file from const to path or file_options or something. Because consts with runtime getter and setter are not super constant.

RaenonX commented 10 months ago

Ok so instead of delete set_files_directory I tried to make it work. You're right that the timing is delicate. But even with os.setenv timing is delicate. And if I can avoid mixing operating system implementation and math... I guess it's worth something.

Can you check if the current implementation also match your need ?

If so, I'll probably rename the file from const to path or file_options or something. Because consts with runtime getter and setter are not super constant.

Makes sense to me, I'll check now and see if it works.

RaenonX commented 10 months ago

Works like a charm. The API did not break, I can run the without any changes from the API consumer side.

Could you additionally check if api.py is using DFO? I can't tell but that looks like it's using the older way.

jeancroy commented 10 months ago

api.py is still using the previous system.

We basically need to talk about what you do with ingredientSplit etc before we move to the new system. Because the new values are a bit better but only if we enable extra rounding when using rp_model.calc.compute_rp

If the data was wrong could it not reach the end.

The failure condition is best result within limited effort. It'll evaluate the model 2000 times then abort. So success is near 1500 and failure near 2000. When I dump the results using table() the first few members are like an enum for failure conditions, so if we just read that, I can see how it look like a failure.

Also in general, not using derivative is much slower than using them. There's no free lunch :( So it can feel infinite yes. Compared to yesterday I fixed something too. The final rounding of RP was commented out when I was trying to debug something when Goobliss changed the sheet. I re-enabled it today.

jeancroy commented 10 months ago

Also using the hash of the data as part of the filename is a neat party trick to avoid recompute, but the cleanup is currently manual.

Maybe we need some kind of add-one / remove-one queue logic.

RaenonX commented 10 months ago

Also using the hash of the data as part of the filename is a neat party trick to avoid recompute, but the cleanup is currently manual.

Maybe we need some kind of add-one / remove-one queue logic.

I was thinking of appending timestamp or data count to the file name (if possible) on this. Opened #3 to address this.

RaenonX commented 10 months ago

api.py is still using the previous system.

We basically need to talk about what you do with ingredientSplit etc before we move to the new system. Because the new values are a bit better but only if we enable extra rounding when using rp_model.calc.compute_rp

If the data was wrong could it not reach the end.

The failure condition is best result within limited effort. It'll evaluate the model 2000 times then abort. So success is near 1500 and failure near 2000. When I dump the results using table() the first few members are like an enum for failure conditions, so if we just read that, I can see how it look like a failure.

Also in general, not using derivative is much slower than using them. There's no free lunch :( So it can feel infinite yes. Compared to yesterday I fixed something too. The final rounding of RP was commented out when I was trying to debug something when Goobliss changed the sheet. I re-enabled it today.

4 created for this