riverscuomo / apps

a monorepo of all my python scripts, modules, and packages
Other
14 stars 3 forks source link

How to run the packages from another script #1

Closed riverscuomo closed 1 year ago

riverscuomo commented 1 year ago

@brettcannon
I have a python script apps\maintenance.py that imports a number of my packages and runs them. As I've converted some of these packages to poetry, I can't figure out how to get them to run. They're importing just fine but then I get this error when I call main() on the import.

How to reproduce the error

In the apps directory, run py maintenance.py -m new_albums. This should run the script with an argument to import and run the apps\new_albums package. You should then see some variation of this error:

module 'new_albums' has no attribute 'main'

Finally, the new_albums package runs fine if change to the new_albums package directory and run py new_albums.

Is there something wrong with the way I've set up the new_albums package with poetry? I should be able to import and run its main function, correct?

Note:

The most relevant part of the code is the run() function https://github.com/riverscuomo/apps/blob/49cd496e243b26cdb9ac867aee87b0553118d054/maintenance.py#L375

franomai commented 1 year ago

FYR when trying to clone this I cannot access new_albums (it just clones as an empty folder with no other indicators) - you can add it as a submodule, or use some of the answers from https://stackoverflow.com/questions/47008290/git-how-to-make-outer-repository-and-embedded-repository-work-as-common-standal to just add them in as regular files.

riverscuomo commented 1 year ago

@franomai thank you. I'll try that.

franomai commented 1 year ago

Hey @riverscuomo -

I have been looking at the general issue you have been having. I am not sure if this helps, but I think it might stem from the fact that importing a module does not import the submodules by default, and the main() you are trying to access is sitting under the __main__ submodule (assuming the new_albums repo is https://github.com/riverscuomo/new-albums).

Here is a log from going

for importer, modname, ispkg in pkgutil.iter_modules(setup_module.__path__):
                print("Found submodule %s (is a package: %s)" % (modname, ispkg))

image

If you want to change how this functions, you'll have to change the __init__ in the new_albums package folder to expose it. With the package as it currently is, after locally changing a few of the relative imports in the new_albums package so it runs for me, going

setup_module = importlib.import_module("new_albums.__main__")
setup_module.main()

worked as expected.

As you noted in your readme, as py new_albums is working, you could run the module as a script (which is what that command does) by going runpy.run_module("new_albums", run_name="__main__") rather than importing it and making the call. Whatever works best for you!

riverscuomo commented 1 year ago

@franomai Thanks. That sounds very promising. I haven't quite been able to get it to work yet. I tried moving __main__ to new_albums/but I still get module 'new_albums' has no attribute 'main'. I tried exposing main() in the new_Albums/__init__file and got these errors:

#new_Albums/__init__.py

from __main__ import * 
# """ 
#   File "C:\RC Dropbox\Rivers Cuomo\Apps\new_albums\new_albums\__main__.py", line 7, in <module>
#     import new_albums.config as config
# ModuleNotFoundError: No module named 'new_albums.config'
# """

from new_albums import config 
# ImportError: cannot import name 'config' from partially initialized module 'new_albums' (most likely due to a circular import) (C:\RC Dropbox\Rivers Cuomo\Apps\new_albums\__init__.py)

By the way, I added the 'new_albums' package as a submodule to this repo.

I've committed everything so you can easily see what I tried.

brettcannon commented 1 year ago

I haven't had a chance to check out your code to test what you're doing, @riverscuomo , but based on what you've said here I think you have ended up with a circular import because you have your new_albums.__init__ module importing new_albums.__main__ module which then implicitly imports new_albums.__init__.

There are a a few options to try and work around this, all of which are valid approaches in their own way.

Option one is to use the runpy module from Python's stdlib. This is actually what Python's own -m flag uses, so it will lead to the same result as if you had said py -m new_albums (which is generally how you want to run code that contained within a package). That prevents you from having to try and get to the main() function on your own since Python will end up importing new_album.__main__ for you and setting __name__ == "__main__" so the code runs as you expect. You can try out runpy.run_module("new_albums", run_name="__main__") and make sure that it works.

Option Two is to import new_albums.__main__ to then get at your main() function that way using importlib.import_module(). There's actually nothing special about the __main__.py file beyond the fact that Python will use that if you use -m. That means you can just import it like any other module. So importlib.import_module("new_albums.__main__").main() should work.

Option three is to not use from __main__ import * but to use from . import __main__ to try and break the circular import (or whatever code reorganizing is necessary). Basically you need to have the imports complete before they somehow get back to new_albums.__init__. Usually you get around that by importing to the module and not to individual objects within the module since the import machinery will have access to the module object in sys.modules at that point and thus won't complain that there's a circular import (if you want more details about this, please feel free to ask as I'm purposefully not diving into minute details in case you don't care 🙂).

I think any of these three options will get you what you're after. If not, then please let me know and I can do a deeper dive. Regardless, none of this has to do with Poetry and entirely about how Python's import system works (which I take partial responsibility for since I did write importlib 😅, although in my defense a lot of these semantics predate my involvement with Python's development 😁).

franomai commented 1 year ago

Hey all, from @brettcannon's information above I realised that when I was trying to unpick the imports for @riverscuomo's issue, I was not checking py -m new_albums with the -m flag. This unblocked me and with a few changes the import now seems to work - you can have a look at what I updated in https://github.com/riverscuomo/apps/pull/2. If you decide to go for this route, and my changes look reasonable, the code should now run as is. I am not a Python master though, so definitely have a second pair of eyes on it! Hope this helps.

riverscuomo commented 1 year ago

@brettcannon Good to know it's not about poetry, at least.

Still getting similar errors with options 1 and 2. I couldn't quite figure out how to implement option 3.

There must be something wrong with the way I have the new albums package setup.

For now, I've simplified the maintenance.pyscript even more. It tries to import new_albumsfirst thing right there in its mainfunction. I've also removed any unrelated imports from the top ofmaintenance.py.

image

image

franomai commented 1 year ago

Hey there @riverscuomo - just to keep you moving forward while @brettcannon investigates, I think you are getting these errors (no module new_albums.__main__) because, in this case, Python thinks the new_albums package is referring to the outer folder new_albums, instead of the sub folder new_albums, where all your main code is: image

The reason it thinks that is because the outer folder has a __init__.py file, and so, when apps is running, it is looking at the apps/new_albums folder as a package instead of trying to look up the new_albums that gets built when you run pip install . (the sub folder with the main). You can test this by having a look at the output before and after removing the __init__.py from this folder:

Module being read from apps/new_albums (your screenshot)

image

Module being read from the built site-packages after deletion of the top level __init__.py

image

I am not sure that you want this top level to be a package (as it would be just have the new_albums package INSIDE it which actually has your functionality), but you will see that even if you delete this top level __init__.py, or call the sub new_albums/new_albums from your maintenance.py, you will still get some errors:

\AppData\Local\Programs\Python\Python311\Lib\site-packages\new_albums\classes\albumClass.py", line 5, in
<module>
    from classes.playlistClass import playlistClass
ModuleNotFoundError: No module named 'classes'

My PR on the new_albums repo (https://github.com/riverscuomo/new-albums/pull/32) should fix these, however, if you just want to swat the errors you are seeing here yourself and NOT merge that, after updating classes\albumClass.py -> classes.playlistClass to be new_album.classes.playlistClass and updating accept_reject.py -> config to be new_album.config, and rebuilding with the pip install . in your README.md, I was able to get it to run.

I hope this information clears some stuff up.

franomai commented 1 year ago

Hey, apologies, I just noticed that by changing the config import at the top of main, I have broken some of the main script later down - I didn't catch it because I don't get past the authorisation. My bad! Hopefully you are able to replace the . imports with new_albums and everything above should still be valid! Also, if you don't want to call it just as new_albums.main(), and are okay to access it via new_albums.__main__.main(), you can remove the from .__main__ import main line in the __init__.py and it should remove the warning as well - whatever works best.

riverscuomo commented 1 year ago

@franomai oh interesting. I just saw your comment now after I did the PR and made some changes and got it to work. I may try removing the from .__main__ import main line in the __init__.py later.

The good news is that new musicis running both on its own and also as an import to maintenance.py. I also confirmed that maintenancecan import a simple module from the apps folder. I tested this with a module called pool.

But when I try to import another package, social, I'm getting the ModuleNotFoundError: No module named 'social.__main__' error Again. I added socialas a sub module here for testing. You can see it's structured exactly like new albums so I'm not sure why we can't find main().

franomai commented 1 year ago

Hey @riverscuomo, nice one, glad to hear you got it working! As for the new issue, have you run a pip install . on the social directory? When I do, I get this error instead:

image

Just FYR, as running at the moment, you're executing the built version of these packages - so if you make a change in the submodule repo, you will have to pip install . it. Is that fine with you? Otherwise I think you might have to make a change to have maintenance.py to step into the folder, and then import/execute it from there. I am not sure what the best practice way is to do that - maybe there is a slightly different call to import_module that will work?

riverscuomo commented 1 year ago

@franomai Interesting. I didn't know that about the built versions. pip install . fixed the error and now running py maintenance.py runs social successfully.

Too bad you're getting the unrecognized arguments error. It looks like your social is trying to parse the argument you passed to maintenance: -m social. Some of my other packages, such as new_albums, correct this:

image

It would be better for me to be able to run package mains from maintenance without having to rebuild every time I make a change in the package.

Incidentally, to build a package, would it be better to use poetry install? After all, I've been trying to convert these packages to poetry.

brettcannon commented 1 year ago

It would be better for me to be able to run package mains from maintenance without having to rebuild every time I make a change in the package.

If you do what's called an editable install you won't have to re-install your own code that you're actively editing. People typically do that with pip install -e. You can then point pip at the directory holding the code, e.g. . is common for the current directory.

But with Poetry you might need to use https://python-poetry.org/docs/cli/#add (I'm not a Poetry user so I unfortunately don't have more insight).

BTW what's the goal/purpose of maintenance.py?

Incidentally, to build a package, would it be better to use poetry install?

Looking at https://python-poetry.org/docs/cli/#install , it seems the key difference is whether you care about your Poetry.lock file to be used. If you do then you want poetry install, otherwise it shouldn't matter.

franomai commented 1 year ago

@brettcannon, from what I can see in maintenance.py, the main goal is to run a large number of __main__ modules in bulk, on different schedules. The packages containing these are all git submodules sitting inside this main app repo, with the structure package_name (repo) /package_name (package folder) /__main__. It looks like importlib.import_module doesn't have a way to specify a path, and specifying the package/module relatively has some issues with how the some of the imports are defined inside these __main__s - if you had some insight about how to run them directly from this code without the build, while working with this folder structure, I'm sure it would be appreciated! At the moment though, the script is running them as installed packages, hence the question about the rebuilds. I think your info about editable installs should resolve @riverscuomo's issue though - here is some info about poetry and editable installs, for reference: https://github.com/python-poetry/poetry/issues/34#issuecomment-1055142428. It looks like for poetry installs they're editable by default, so it might be worth a test! Otherwise it looks like pip install -e . on a package will do the trick.

riverscuomo commented 1 year ago

Yep, maintenance.py. Is a master script that imports and runs many other Scripts, some daily some weekly. It's triggered by Windows task scheduler every night. Some of the scripts are simple modules that sit in the same directory, such as pool.py, and others are packages that sit in the same directory, such as "New Albums" and "Spotnik". (A full list of the Imports can be found in maintenance_config.py.) I want all of these scripts and packages to be runnable by anyone on the command line and also programmatically by the maintenance.py script. So far, through our efforts in this GitHub issue, it looks like the simple modules, such as pool.py are working well, and the straightforward packages are working well, such as "New Albums", "Social", and "Spotnik".

Finally I'd like to tackle a more complicated package, "crawlers", which right now is a directory within apps which has three packages in it, spotifycrawler, LastFMcrawler and a package of common code called core. I'd like maintenance.py to be able to import and run spotifycrawler and LastFMcrawler, and for those 2 packages to be able to draw on the common code in core.

BTW, please let me know if this doesn't seem like a reasonable way to import and run a large number of packages on a regular schedule.

also prints a report after running to google sheets:

image

franomai commented 1 year ago

Hey @riverscuomo,

Firstly, I think it seems pretty reasonable, and second, I think it's pretty rad! Cool that it makes a report for you. I can only dream to have so many useful scripts I'd need to manage them, haha.

Second, I made a PR to crawlers at https://github.com/riverscuomo/crawlers/pull/1 to try and fix up the errors there. As I mention there, I am not sure if I swatted all of them, but the change I made to the LastFMcrawler should give some idea about what was causing the issues I ran into getting it to run. Namely, I had to make a small change to add/update some __init__s and change the imports drawing on them accordingly. Hope this helps!

brettcannon commented 1 year ago

please let me know if this doesn't seem like a reasonable way to import and run a large number of packages on a regular schedule.

Thanks for the details! This helps a lot. So you've developed your own cron job solution. Since this is for Windows I don't know if it's "reasonable", but it's definitely a great learning opportunity, so who cares. 😄

I'm assuming you already set up a virtual environment with all of the requisite dependencies installed somewhere (the apps repo doesn't have pyproject.toml or anything that lists what you would consider a dependency for maintenance or to make sure all the dependencies of what you're running is installed) ? Will all the code always be kept in either the same directory as maintenance.py (e.g. pool) or sub-directories that have packages (e.g. crawlers)?

If the answer is "yes" to both, then I have a solution which I can submit a PR for this week, but if either of you want to take a stab at it, my idea is:

This will implicitly make the shared code in core for crawlers work. You can get fancy and try and isolate stuff between running different things, but if everything is in modules with different names or within a package then there shouldn't be any module name clashes.

riverscuomo commented 1 year ago

Great. I have only 2 packages left to fix for maintenance imports. And I'm pretty confident I know how to do it. From now on, I'll structure all packages as we have them here.

riverscuomo commented 1 year ago

Yep, I got the final 2 packages running now. Thank you, gentleman. Let me know if I can ever get you =w= tickets :)

brettcannon commented 1 year ago

I got the final 2 packages running now.

Congrats!

Thank you, gentleman.

You're quite welcome!

Now that I'm back from vacation (technically still on it, but at least back home), I was able to pull together a quick PR. I did a quick pass on maintenance.py and touched up a couple of things to be a bit more Pythonic, and Black and Ruff accidentally did the rest (I forgot I had the extensions for the tools turned on for VS Code). 😅 If you want me to undo the formatting just let me know.

But the key thing is I added a pyproject.toml and its accompanying Poetry.lock. I realized that Poetry can take care of doing editable installs, keeping you in the Poetry workflow and avoiding having to do pip install -e (i.e. poetry run will do the right thing)! Now there is one issue in that the specified Rich dependencies vary between the various sub-projects in incompatible ways (i.e. social says ^13.1.0 while new-albums says ^12.5.1). This is also somewhat complicated by gspreader locking itself to ^12.5.1. Now, there are two solutions to solving this.

Solution 1 is to just unify your requirements for Rich to match what gspreader requires. This is the fastest solution (and what I did to do this PR quickly).

Solution 2, and the one I personally suggest, is to remove your upper-bound constraints in all of your libraries (i.e. don't use ^ in your dependency requirements, but instead use >=). Henry Schreiner's post on this topic does a good, thorough job of explaining why you don't want to do this for libraries (disclaimer: I'm quoted in the blog post).

Let me know if I can ever get you =w= tickets :)

Definitely! Although looking at your current tour dates you aren't coming near Vancouver, Canada anytime soon, so I will have to take a rain cheque (any preference on how to bug you about this very generous offer when you do come to town? My email is brett at python.org if that helps at all, which is another way to reach me if needed).

Also, if you happen to be coming to PyCon US 2023 or PyCascades some time in the future (looks like PyCascades this year conflicts with your Tampa, FL tour stop), do please find me and say hi. 🙂 BTW in case you're worried about folks bugging you at PyCon US 2023 about music stuff instead of coding stuff, do know that PyCon US this year has a mask mandate and you can specify your name on your badge, so you can be as anonymous as you want).

riverscuomo commented 1 year ago

@brettcannon Cool, thanks. I merged your PR and then implemented solution 2. We're about to announce a tour that brings us to Vancouver at the end of the summer. I'll send you my deets via email. Would be fun to meet in person. And maybe I'll check out PyCon this year. Never been.

franomai commented 1 year ago

Hey @riverscuomo, glad to hear the last few packages worked without any issues, and that the crawlers PR merged cleanly. Hopefully the package setup works for any new ones you decide to add going forward.

It was cool working with you on this, feel free to hit me up at firstname dot lastname at gmail dot com if you need anything else or make any new packages going forward - I do a lot of stuff with music related automation as well so if you have any new ideas for stuff I'd be keen to hear about it! As for touring, hopefully you guys swing by Australasia sometime, and do a Sydney gig! Was really bummed when Hella Mega got cancelled in NZ. 😭