indygreg / PyOxidizer

A modern Python application packaging and distribution tool
Mozilla Public License 2.0
5.46k stars 236 forks source link

OxidizedFinder.iter_modules() support #253

Closed philipkimmey closed 4 years ago

philipkimmey commented 4 years ago

Hello! What a cool project!

I've been engaged in what may be the longest yak-shaving exercise of my life over the last several days in trying to improve how we distribute some internal tools. The path of least resistance is almost certainly using https://github.com/pantsbuild/pex but I've even tried https://github.com/Nuitka/Nuitka which is a very cool project as well.

Our internal tool has one big dependency which is botocore. I've been working to eliminate __file__ dependencies from that project and I have something that needs some cleanup but is (mostly?) functional: https://github.com/boto/botocore/pull/2046 .

In turn, that let me build my package using PyOxidizer and actually execute it, but unfortunately the importlib.resources behaviors end up being quite different between a normal Python runtime environment and the PyOxidizer context for reasons that aren't entirely clear to me.

Specifically, in the context of that PR vs. the PyOxidizer context here's the differing behavior:

>>> import botocore.resource_file_adapter
>>> [i for i in botocore.resource_file_adapter.os_adapter.listdir('pkg://botocore/data')]
['dataexchange', 'lex-models', 'storagegateway', 'endpoints.json', 'elb', 'alexaforbusiness', 'docdb', 'ds', 'devicefarm', 'groundstation', 'athena', 'kinesis-video-media', 'serverlessrepo', 'iotevents-data', 'license-manager', 'sms-voice', 'personalize', 'appconfig', 'pinpoint-email', 'forecast', 'elasticache', 'iotthingsgraph', 'detective', 'logs', 'securityhub', 'neptune', 'route53domains', 'macie', 'synthetics', 'ec2-instance-connect', 'robomaker', 'workspaces', 'acm', 'appsync', 'iot1click-devices', 'events', 'iotanalytics', 'ram', 'connectparticipant', 'pricing', '_retry.json', 'gamelift', 'marketplace-entitlement', 'lambda', 'emr', 'codecommit', 'kinesis', 'lakeformation', 'comprehendmedical', 'chime', 'ec2', 'sesv2', 'firehose', 'fsx', 'textract', 'opsworkscm', 'pinpoint', 'sns', 'mediaconvert', 'meteringmarketplace', 'sqs', 'xray', 'cloudtrail', 'sdb', 'migrationhub-config', 'iotsecuretunneling', 'application-autoscaling', 'mediapackage-vod', 'frauddetector', 'imagebuilder', 'sso-oidc', 'apigatewaymanagementapi', 'kinesisanalytics', 'dynamodb', 'route53resolver', 'sagemaker-a2i-runtime', 'polly', 'codestar-connections', 'codestar-notifications', 'redshift', 'waf', 'kafka', 'servicecatalog', 'globalaccelerator', 'ecr', 'codestar', 'amplify', 'datasync', 'transfer', 'es', 'rekognition', 'sso', 'iot1click-projects', 'cloudwatch', 'iot-data', 'iot-jobs-data', 'iotevents', 'dynamodbstreams', 'forecastquery', 'kinesis-video-signaling', 'config', 'efs', 'translate', 's3', 'pinpoint-sms-voice', 'cloudfront', 'kms', 'support', 'cloudhsm', 'connect', 'servicediscovery', 'codeguruprofiler', 'mgh', 'application-insights', 'cloudformation', 'apigatewayv2', 'transcribe', 'workmailmessageflow', 'elastictranscoder', 'snowball', 'greengrass', 'guardduty', 'kinesisanalyticsv2', 'mediatailor', 'autoscaling-plans', 'kendra', 'pi', 'ssm', 'personalize-events', 'signer', 'cur', 'iot', 'backup', 'mediapackage', 'dms', 'service-quotas', 'mediastore', 'mediastore-data', 'appmesh', 'batch', 'worklink', 'sagemaker-runtime', 'qldb', 'savingsplans', 'organizations', 'cognito-identity', 'sts', 'ebs', 'schemas', 'wafv2', 'rds', 'marketplacecommerceanalytics', 'personalize-runtime', 'lightsail', 'mq', 'autoscaling', 'apigateway', 'mobile', 'inspector', 'networkmanager', 'cloudhsmv2', 'acm-pca', 'secretsmanager', 'cloudsearchdomain', 'dax', 'cloudsearch', 'ce', 'importexport', 'comprehend', 'machinelearning', 'kinesisvideo', 'kinesis-video-archived-media', 'compute-optimizer', 'workdocs', 'sms', 'cognito-sync', 'medialive', 'sagemaker', 'resource-groups', 'health', 'managedblockchain', 'clouddirectory', 'qldb-session', 's3control', 'directconnect', '__init__.py', 'opsworks', 'accessanalyzer', 'route53', 'discovery', 'elasticbeanstalk', 'codedeploy', 'quicksight', 'iotsitewise', 'ses', 'elbv2', 'workmail', 'fms', 'mturk', 'budgets', 'resourcegroupstaggingapi', 'appstream', 'iam', 'elastic-inference', 'stepfunctions', 'glue', 'marketplace-catalog', 'ecs', 'swf', 'cloud9', 'datapipeline', 'waf-regional', 'codeguru-reviewer', 'mediaconnect', 'codebuild', 'dlm', 'glacier', 'eks', 'lex-runtime', 'shield', 'cognito-idp', 'outposts', 'codepipeline', 'rds-data']

By comparison, in the PyOxidizer context we get this:

>>> import botocore.resource_file_adapter
>>> [i for i in botocore.resource_file_adapter.os_adapter.listdir('pkg://botocore/data')]
['endpoints.json', '_retry.json']

The documentation is pretty clear that importlib.abc.ResourceReader.contents may or may not return non-resource contents, but another option would be to change the behavior there in PyOxidizer's content_impl to return non-resources. (Which appears to be more similar to how the cPython finder ends up behaving.)

Long story short, I think implementing iter_modules on the OxidizerFinder would let me use pkgutil.walk_packages to find those non-resource packages but I'm now pretty far out of my depth and could use some guidance.

Thanks!

philipkimmey commented 4 years ago

I also just noticed some of the related discussion in https://github.com/indygreg/PyOxidizer/issues/237 .

indygreg commented 4 years ago

Oh, I had no clue there was an optional Finder.iter_modules() that things in the wild look for! We should definitely implement that!

As for exposing Python modules as resources, I actually have a half-finished patch somewhere that attempts to do this. One of the big areas of focus for the upcoming release has been shoring up the code in the pyembed crate. And exposing Python modules as resources is definitely on the short list of things I'd like to do before the next release.

Thank you for the excellent report. And thank you for fighting the good fight and porting botocore to the new Python API! Feel free to reference #69 in that PR so we have a better record of changes to __file__ in external projects.