Separate distribution into multiple, one per API family?

karenetheridge commented 7 years ago

Thanks to Amazon creating so many new web services, this distribution is now huge -- almost 8000 modules in version 0.32. Would it be possible to ship each API family as its own distribution, perhaps with another dist containing shared parent code if needed?

I am willing to help -- my work's infrastructure is starting to feel the load of installing such a large distribution when we only need a few pieces of it.

The easiest thing to do would be to simply make a separate distribution for each second-level directory -- when new services are added, meaning existing distributions should not need to change. A shared dzil plugin bundle, containing helper scripts, could be created to streamline future development.

Paws/ACM
Paws/API
Paws/ApiGateway
Paws/AppStream
Paws/ApplicationAutoScaling
Paws/AutoScaling
Paws/Batch
Paws/Budgets
Paws/CUR
Paws/CloudDirectory
Paws/CloudFormation
Paws/CloudFront
Paws/CloudHSM
Paws/CloudSearch
Paws/CloudSearchDomain
Paws/CloudTrail
Paws/CloudWatch
Paws/CloudWatchEvents
Paws/CloudWatchLogs
Paws/CodeBuild
Paws/CodeCommit
Paws/CodeDeploy
Paws/CodePipeline
Paws/CodeStar
Paws/CognitoIdentity
Paws/CognitoIdp
Paws/CognitoSync
Paws/Config
Paws/Credential
Paws/DMS
Paws/DS
Paws/DataPipeline
Paws/DeviceFarm
Paws/DirectConnect
Paws/Discovery
Paws/DynamoDB
Paws/DynamoDBStreams
Paws/EC2
Paws/ECR
Paws/ECS
Paws/EFS
Paws/ELB
Paws/ELBv2
Paws/EMR
Paws/ES
Paws/ElastiCache
Paws/ElasticBeanstalk
Paws/ElasticTranscoder
Paws/Firehose
Paws/GameLift
Paws/Glacier
Paws/Health
Paws/IAM
Paws/ImportExport
Paws/Inspector
Paws/IoT
Paws/IoTData
Paws/KMS
Paws/Kinesis
Paws/KinesisAnalytics
Paws/Lambda
Paws/LexModels
Paws/LexRuntime
Paws/Lightsail
Paws/MTurk
Paws/MachineLearning
Paws/MarketplaceCommerceAnalytics
Paws/MarketplaceMetering
Paws/Net
Paws/OpsWorks
Paws/OpsWorksCM
Paws/Organizations
Paws/Pinpoint
Paws/Polly
Paws/RDS
Paws/RedShift
Paws/Rekognition
Paws/ResourceTagging
Paws/Route53
Paws/Route53Domains
Paws/S3
Paws/SDB
Paws/SES
Paws/SMS
Paws/SNS
Paws/SQS
Paws/SSM
Paws/STS
Paws/ServiceCatalog
Paws/Shield
Paws/Signin
Paws/SimpleWorkflow
Paws/Snowball
Paws/StepFunctions
Paws/StorageGateway
Paws/Support
Paws/WAF
Paws/WAFRegional
Paws/WorkDocs
Paws/WorkSpaces
Paws/XRay

pplu commented 7 years ago

👍 on the plan proposed

karenetheridge commented 7 years ago

cribbing from https://metacpan.org/source/ETHER/Dist-Zilla-Plugin-GitHub-0.44/lib/Dist/Zilla/Plugin/GitHub/Create.pm, here's some code to create a github repository (untested) -- no one would want to have to create all these by hand :)

    use HTTP::Tiny;
    use JSON::MaybeXS;
    use Git::Wrapper;

    my $root = ...; # local directory to create repositories

    # enter loop over all repositories to create...
    my $repo_name = ...;
    my $description = ...;
    my ($login, $pass) = ...;   # get from .github, or embed in the code

    my $http = HTTP::Tiny->new;

    print "Creating new GitHub repository '$repo_name'\n";

    my $params = {
        name}   => $repo_name,
        public} => 1,
        description => $description,
        has_issues => 1,
        has_wiki => 1,
    };

    my $url = 'https://api.github.com' . '/orgs/' . $org . '/repos';

    my $headers;
    if ($pass) {
        require MIME::Base64;
        my $basic = MIME::Base64::encode_base64("$login:$pass", '');
        $headers->{authorization} = "Basic $basic";
    }

    my $content = encode_json($params);

    my $response = $http->request('POST', $url, {
        content => $content,
        headers => $headers
    });

    my $repo;
    try {
        my $json_text = decode_json($response->{content});

        if (!$response->{success}) {
            return 'redo' if (($response->{status} eq '401') and
                              ($response->{headers}{'x-github-otp'} =~ /^required/));

            print "Error: $json_text->{message}\n";
            return;
        }

        $repo = $json_text;
    } catch {
        if ($response and !$response->{success} and
            $response->{status} eq '599') {
            #possibly HTTP::Tiny error
            print "Err: ", $response->{content}, "\n";
            return;
        }

        print "Err: Can't connect to GitHub\n";
        return;
    }

    my $git_dir = "$root/.git";
    my $rem_ref = $git_dir."/refs/remotes/origin";

    if ((-d $git_dir) && (not -d $rem_ref)) {
        my $git = Git::Wrapper->new($root);

        print "Setting GitHub remote 'origin'\n";
        $git->remote("add", 'origin', $repo->{ssh_url});

        my ($branch) = try { $git->rev_parse(
            { abbrev_ref => 1, symbolic_full_name => 1 }, 'HEAD'
        ) };

        if ($branch) {
            try {
                $git->config("branch.$branch.merge");
                $git->config("branch.$branch.remote");
            } catch {
                print "Setting up remote tracking for branch '$branch'\n";

                $git->config("branch.$branch.merge", "refs/heads/$branch");
                $git->config("branch.$branch.remote", 'origin');
            };
        }
    }

pplu commented 7 years ago

Hi,

@karenetheridge I don't understand how this last message relates to this issue. Can you please help me out?

karenetheridge commented 7 years ago

I am envisioning these steps to splitting up Paws into multiple distributions -- since there are lot of them, it would be nice to script most of it:

(pseudocode)

foreach $dir in <lib/Paws/*> {

    create new github repository named Paws-Foo (see code above, which uses github's API to do that)
    create new local git repostory with the remote pointing to the github repo just created
    start it off with all the commits from the Paws directory
    delete everything in lib/ except lib/Paws/Foo.pm and lib/Paws/Foo/
    delete t/* that doesn't test this API # (this will have to be a manual step, unless a consistent naming convention was used for all tests)
    edit dist.ini to fix distribution name, and declare prereqs on main Paws distribution (base class, any needed utility modules)
    test.. iterate...
    release (making sure $VERSION is incremented)
}

pplu commented 7 years ago

Now I understand :) I didn't have in mind splitting every module into a new git repository, since it would imply working with lots of repos (but maybe you already have an idea of how to do that). I was thinking more in the lines of a solution that would generate a dist.ini for a specific module, invoke dzil, generate the next one, invoke dzil, ...

Let me further develop:

Say you have a dist.ini-submodules, that is a template toolkit file with some variables to fill out (the name of the dist, and maybe the directory where it's located). That dist.ini has prereqs "hardcoded" to Paws, and uses GatherFiles to get the files in auto-lib/Paws/ServiceName* into lib/Paws/ServiceName.

foreach $dir in <auto-lib/Paws/*> {
  generate_dist_ini_for $dir  # this guy adapts what it needs in dist.ini-submodules
  dzil build
}

We could make a target in the makefile make dists that generates the "main dist"

cp dist.ini-main dist.ini 
dzil build

and then build the submodules.

What do you think?

karenetheridge commented 7 years ago

I set up the build system for Task-Kensho (https://github.com/EnlightenedPerlOrganisation/task-kensho) which generates/builds/releases multiple distributions from one repository, but that only works well because all of the content in the sub-modules is generated. And honestly, it is a big pain to manage, and only really works out because every time one of the sub-modules needs a release, the parent module gets released as well. Is Paws really in the same boat? There's a huge amount of code there and I would think you'd want to be able to release them independently.

karenetheridge commented 7 years ago

Another reason to split up the distributions manually, rather than autogenerate them, is for dependency management -- splitting them up means that the API components can have different sets of dependencies, and not force users to install all the prereqs that they currently are if they only need to use a few of them.

pplu commented 7 years ago

I'm stuck between the two (I find good arguments for both cases), although I tend to find a bit simpler the "one repo" solution (maybe it's because it's the one that I best understand). But: please do as you feel will be best (after all I'm the one asking for help!). I appreciate your experience managing lots of modules, so I am open to letting you take the decision.

karenetheridge commented 7 years ago

how connected is the code between the different APIs? for example, if you are making changes to the code in one API, are a bunch of other APIs affected at the same time? or are they mostly independent? if everything all changes at the same time, that suggests staying with just one repository -- but if they can be independent, then splitting up into multiple repositories would make maintenance and future releases easier.

pplu commented 7 years ago

They are independent from the "API update" point of view: when there are new calls that AWS publishes, each module gets it's changes independently, and can be updated on CPAN independently. The party starts when something changes in the way the modules are generated: documentation, adding some metadata so that the core modules can take advantage of said metadata. In that case lots of modules are affected, and there is high probability that they have to depend on a new version of the core modules (since taking advantage of the new behavior is handled in the core).

frioux commented 7 years ago

fwiw I for one think a single repo with multiple dists inside would be a reasonable path forward, given how tightly integrated we would expect this stuff to be. But I am about to post an issue that might argue against this issue.

pplu commented 7 years ago

BTW: I started this branch as a proof of concept of packaging Paws into mini-modules. There is a make target to package everything. I'm pretty sure this will benefit from #179 as well

pplu commented 6 years ago

Just an update on what was done, and why there hasn't been much movement on that branh (https://github.com/pplu/aws-sdk-perl/tree/feature/split-into-submodules).

There is a make target to build just the Paws distro, and all service modules. The thing is that tests start to fail because they depend on some services being installed, and the service distros have no tests.

Do you have any ideas on how to tackle this?

pplu / aws-sdk-perl

Separate distribution into multiple, one per API family? #174