gratipay / gratipay.com

Here lieth a pioneer in open source sustainability. RIP
https://gratipay.news/the-end-cbfba8f50981
MIT License
1.12k stars 308 forks source link

Integrate npm #4148

Closed chadwhitacre closed 7 years ago

chadwhitacre commented 7 years ago

✈️ This is the flight deck for the Integrate npm project. ✈️


Current open-source crowdfunding options (Kickstarter, Patreon, Gratipay, OpenCollective, etc.) are consumer-grade. Our hunch is that a business-grade product with better aggregation can better serve the companies that want to pay for open source, because companies use hundreds or thousands of open source packages, not just a few.

Picking up from https://github.com/gratipay/gratipay.com/pull/4135#issuecomment-255122149 and https://github.com/gratipay/inside.gratipay.com/issues/852#issuecomment-255098337 ...

For wider context see:

JavaScript is the most popular language in open source and npm is the most popular package manager for JavaScript. A good first concrete step towards helping companies pay for open source (#4135), therefore, will be to add the ability to pay for any package on npm. Once we have npm deployed, we will have enough experience to inform a partnership with Libraries.io for the rest of the package managers.

Target

Our goal is to announce this feature in my lightning talk on Thursday, October 26 at Red Hat's All Things Open conference (https://github.com/gratipay/inside.gratipay.com/issues/757).

Our goal is to incrementally improve this feature throughout the first half of 2017, with an eye towards OSCON and $ustain in May.

Package names to test with

From https://github.com/gratipay/gratipay.com/pull/4135#issuecomment-262672635:

http://localhost:8537/on/npm/async/ http://localhost:8537/on/npm/iframe-resizer/ http://localhost:8537/on/npm/mongoose/ http://localhost:8537/on/npm/nodemon/ http://localhost:8537/on/npm/react/ http://localhost:8537/on/npm/react-helmet/ http://localhost:8537/on/npm/react-modal/ http://localhost:8537/on/npm/react-redux/ http://localhost:8537/on/npm/react-router/ http://localhost:8537/on/npm/react-router-redux/ http://localhost:8537/on/npm/redux/ http://localhost:8537/on/npm/redux-thunk/ http://localhost:8537/on/npm/webpack/

Todo

Prerequisites

Checkpoint 1: Inert /on/npm/foo/ Pages

Checkpoint 2: Giving to Packages

Checkpoint 3: Easy Sign-up

Nice to Have

Promotion


✈️ This is the flight deck for the Integrate npm project. ✈️

chadwhitacre commented 7 years ago

I overwrote the our-marky-markdown.js script on a Heroku dyno (using here strings) to give some debugging output.

html for nullisn't a string, it's a functionfunction (selector, context, r, opts)

chadwhitacre commented 7 years ago

Alright, so we need to call .html() on the output of marky(). But why doesn't it fail locally? I have node 4.3.1 where Heroku has 5.11.1. Maybe the write API changed?

chadwhitacre commented 7 years ago

Nope.

chadwhitacre commented 7 years ago

Maybe I don't have the right data loaded up locally?

chadwhitacre commented 7 years ago

Well, they do differ.

Local

gratipay=# select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 372147 │
└────────┘
(1 row)

gratipay=# select count(*) from packages where readme_raw is not null;
┌───────┐
│ count │
├───────┤
│   134 │
└───────┘
(1 row)

gratipay=#

Remote

gratipay::MAROON=> select count(*) from packages;                                                             ┌────────┐
│ count  │
├────────┤
│ 372271 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_raw is not null;
┌────────┐
│ count  │
├────────┤
│ 138260 │
└────────┘
(1 row)

gratipay::MAROON=>
chadwhitacre commented 7 years ago

Oh! I'm seeing marky-markdown v9.0.1 locally, but 8.1.0 on Heroku.

chadwhitacre commented 7 years ago

Cached build?

chadwhitacre commented 7 years ago

For the fetcher bug (https://github.com/gratipay/gratipay.com/issues/4148#issuecomment-259322332), I think we'll need to unthread in order to get better error messaging.

chadwhitacre commented 7 years ago

Actually, threaded_map is supposed to output the original traceback to stdout. Are we losing that? I guess we'll have to reschedule to see ...

chadwhitacre commented 7 years ago

Well that is consternating. Even after https://github.com/gratipay/gratipay.com/pull/4178 I am still seeing 8.1.0 on a Heroku dyno. Hmm ...

chadwhitacre commented 7 years ago

Got it!

chadwhitacre commented 7 years ago

sync-npm process-readmes is working from a one-off dyno. 👍

chadwhitacre commented 7 years ago

I'm not seeing an error from sync-npm fetch-readmes in a one-off dyno. I've rescheduled both that and process-readmes.

chadwhitacre commented 7 years ago

Alright, the original traceback is in there, I had just confused it for a dupe.

2016-11-09T23:51:23.098776+00:00 app[scheduler.8465]: jstuningTraceback (most recent call last):
2016-11-09T23:51:23.098779+00:00 app[scheduler.8465]:   File "/app/gratipay/utils/threaded_map.py", line 15, ing
2016-11-09T23:51:23.098780+00:00 app[scheduler.8465]:     return func(*a, **kw)
2016-11-09T23:51:23.098781+00:00 app[scheduler.8465]:   File "/app/gratipay/package_managers/readmes.py", line 55, in fetch
2016-11-09T23:51:23.098781+00:00 app[scheduler.8465]:     , dirty.name
2016-11-09T23:51:23.098782+00:00 app[scheduler.8465]:   File "/app/.heroku/python/lib/python2.7/site-packages/postgres/__init__.py", line 374, in run
2016-11-09T23:51:23.098783+00:00 app[scheduler.8465]:     cursor.run(sql, parameters)
2016-11-09T23:51:23.098784+00:00 app[scheduler.8465]:   File "/app/.heroku/python/lib/python2.7/site-packages/postgres/cursors.py", line 92, in run
2016-11-09T23:51:23.098784+00:00 app[scheduler.8465]:     self.execute(sql, parameters)
2016-11-09T23:51:23.098785+00:00 app[scheduler.8465]:   File "/app/.heroku/python/lib/python2.7/site-packages/psycopg2/extras.py", line 288, in execute
2016-11-09T23:51:23.098786+00:00 app[scheduler.8465]:     return super(NamedTupleCursor, self).execute(query, vars)
2016-11-09T23:51:23.098787+00:00 app[scheduler.8465]: ProgrammingError: can't adapt type 'dict'
chadwhitacre commented 7 years ago

Interesting. So maybe name or something can be a dict coming from npm?

chadwhitacre commented 7 years ago

That doesn't sound right ...

chadwhitacre commented 7 years ago

I'm working on a PR to add Sentry support to these procs so we have better visibility into and resiliency in the face of errors (surely these won't be the last).

chadwhitacre commented 7 years ago

PR in #4179.

chadwhitacre commented 7 years ago

So it turns out that the registry includes packages that have been unpublished—a lot of them, from what I can tell. They appear as JSON with a 404. E.g. below.

Currently we log "404" for these and leave readme_raw untouched, which means it'll still be null next time around and we'll refetch the same 404. We should notice this case and probably drop the record from our database, though we'll want to be careful to bring it back again when someone else claims it.

I'm a little surprised at how many 404s I'm seeing. How does unpublishing relate to deleting or removing a package?

http://registry.npmjs.com/mysql-schema https://www.npmjs.com/package/mysql-schema

{  
    "_id":"mysql-schema",
    "_rev":"4-c527f6a64778f8d0afbaf6fd4754085e",
    "name":"mysql-schema",
    "time":{  
        "modified":"2013-12-08T03:17:01.297Z",
        "created":"2013-12-08T03:16:59.506Z",
        "0.0.1":"2013-12-08T03:17:01.297Z",
        "unpublished":{  
            "name":"carlosmarte",
            "time":"2014-07-26T06:27:28.217Z",
            "tags":{  
                "latest":"0.0.1"
            },
            "maintainers":[  
                {  
                    "name":"carlosmarte",
                    "email":"dev@carlosmarte.me"
                }
            ],
            "description":"mysql queries helper",
            "versions":[  
                "0.0.1"
            ]
        }
    },
    "_attachments":{  

    }
}
chadwhitacre commented 7 years ago

Okay! Fetcher and processor appear to be chugging along. We want to see these both hit zero, though they won't until we delete or skip 404s.

gratipay::MAROON=> select count(*) from packages where readme_raw is null;
┌────────┐
│ count  │
├────────┤
│ 223221 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌────────┐
│ count  │
├────────┤
│ 323828 │
└────────┘
(1 row)

gratipay::MAROON=>
chadwhitacre commented 7 years ago

Done in https://github.com/gratipay/gratipay.com/pull/4181 and deployed.

chadwhitacre commented 7 years ago

Deleting! 👍

2016-11-10T07:50:24.124188+00:00 app[scheduler.4899]: fetching practiceone
2016-11-10T07:50:23.863759+00:00 app[scheduler.9696]: fetching node-u2f
2016-11-10T07:50:23.939573+00:00 app[scheduler.9696]: fetching node-typograph
2016-11-10T07:50:23.974219+00:00 app[scheduler.9696]: no readme in killdrev
2016-11-10T07:50:23.974273+00:00 app[scheduler.9696]: fetching kill-desktop-osx
2016-11-10T07:50:24.020858+00:00 app[scheduler.9696]: fetching infinigon-tag
2016-11-10T07:50:24.027898+00:00 app[scheduler.9696]: fetching kill-dash-nine
2016-11-10T07:50:24.037882+00:00 app[scheduler.9696]: 404 for spurious-js-aws-sdk-helper
2016-11-10T07:50:24.038176+00:00 app[scheduler.9696]: fetching spur-di
2016-11-10T07:50:24.081055+00:00 app[scheduler.9696]: fetching node-typo
2016-11-10T07:50:24.152517+00:00 app[scheduler.4899]: yet-another-module is 404; deleting
2016-11-10T07:50:24.155182+00:00 app[scheduler.4899]: fetching yet-another-friendly-dependency
2016-11-10T07:50:24.185761+00:00 app[scheduler.4899]: fetching practice-npm-package
2016-11-10T07:50:24.203793+00:00 app[scheduler.4899]: fetching jspm-nodelibs-process
2016-11-10T07:50:24.228972+00:00 app[scheduler.4899]: fetching jspm-nodelibs-path
2016-11-10T07:50:24.273577+00:00 app[scheduler.4899]: fetching hwsl2
2016-11-10T07:50:24.279907+00:00 app[scheduler.4899]: fetching jspm-nodelibs-os
2016-11-10T07:50:24.304598+00:00 app[scheduler.4899]: yet-another-friendly-dependency is 404; deleting
2016-11-10T07:50:24.308818+00:00 app[scheduler.4899]: fetching yet-another-express-routing
2016-11-10T07:50:24.348231+00:00 app[scheduler.4899]: fetching practice_npm
2016-11-10T07:50:24.358175+00:00 app[scheduler.4899]: fetching jspm-nodelibs-net
2016-11-10T07:50:24.367534+00:00 app[scheduler.4899]: yet-another-express-routing is 404; deleting
2016-11-10T07:50:24.370328+00:00 app[scheduler.4899]: fetching yet-another-express-router
2016-11-10T07:50:24.132520+00:00 app[scheduler.9696]: fetching node-typhoon
2016-11-10T07:50:24.162294+00:00 app[scheduler.9696]: 404 for spur-di
2016-11-10T07:50:24.162538+00:00 app[scheduler.9696]: fetching sptitesmith-stylus-retina-template
2016-11-10T07:50:24.182196+00:00 app[scheduler.9696]: fetching kill-combo
2016-11-10T07:50:24.218516+00:00 app[scheduler.9696]: 404 for sptitesmith-stylus-retina-template
2016-11-10T07:50:24.218923+00:00 app[scheduler.9696]: fetching spruce
2016-11-10T07:50:24.253262+00:00 app[scheduler.9696]: no readme in spruce
2016-11-10T07:50:24.253270+00:00 app[scheduler.9696]: fetching sprout-object
2016-11-10T07:50:24.268288+00:00 app[scheduler.9696]: fetching node-typewriter
2016-11-10T07:50:24.291538+00:00 app[scheduler.9696]: 404 for sprout-object
chadwhitacre commented 7 years ago

I'm curious to see how many records we're left with after weeding out the 404s.

Okay! I'm gonna let this run overnight ...

chadwhitacre commented 7 years ago

Almost done fetching! Processing is slower to catch up ...

gratipay::MAROON=> select count(*) from packages where readme_raw is null;
┌───────┐
│ count │
├───────┤
│  3349 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌────────┐
│ count  │
├────────┤
│ 258387 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)
chadwhitacre commented 7 years ago

No movement on first and third. I think we have some packages that aren't 404 but also maybe don't have a readme? I think that's how readme_raw is ending up null for a percentage.

gratipay::MAROON=> select count(*) from packages where readme_raw is null;
┌───────┐
│ count │
├───────┤
│  3349 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌────────┐
│ count  │
├────────┤
│ 230868 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)
chadwhitacre commented 7 years ago

27,519 readmes processed in five hours, call it 5,000 an hour, so ... 45-50 hours remaining? Should be done over the weekend?

chadwhitacre commented 7 years ago

230,868 - 207,714 = 23,154 in about five hours. 👍

chadwhitacre commented 7 years ago

Fetcher crashed!

Captured with #4179. 👍

chadwhitacre commented 7 years ago

mysql-prettify has a readme of {"private": true}. It shows as a string on the HTML npm; looking at the json I do see an object there (though my browser is not prettifying it for some reason? ironic given the name 😛 ).

chadwhitacre commented 7 years ago

I don't see a repo for mysql-prettify. I was gonna check package.json to see if that's where the bad value is coming from. Do we want to stringify it or count is as "no readme"?

chadwhitacre commented 7 years ago

Alright, the {"private": true} issue should be fixed in #4182.

chadwhitacre commented 7 years ago

I'm going to fix the other Sentry bug that I introduced: https://sentry.io/gratipay/gratipay-com/issues/179441800/.

chadwhitacre commented 7 years ago

PR in #4183. Waiting for Travis.

gratipay::MAROON=> select count(*) from packages where readme_raw is null;
┌───────┐
│ count │
├───────┤
│     2 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌───────┐
│ count │
├───────┤
│ 97562 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)

gratipay::MAROON=>
chadwhitacre commented 7 years ago

207,714 - 97,476 = 110,238 in about 19 hours = 5,802/hr. On track! 👍

chadwhitacre commented 7 years ago
gratipay::MAROON=> select count(*) from packages where readme_raw is not null;
┌────────┐
│ count  │
├────────┤
│ 345042 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌───────┐
│ count │
├───────┤
│     3 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)

gratipay::MAROON=>
chadwhitacre commented 7 years ago

Here are the three remaining to be processed:

https://www.npmjs.com/package/testing233 https://www.npmjs.com/package/testing234 https://www.npmjs.com/package/kendo-ui-react-jquery-stockchart

The first two give "ERROR: No README data found!" The third results in a 504!

nobodxbodon commented 7 years ago

Is this part of a plan to build an index of all the popular libraries? Next is Maven rep for Java, if this works out?

chadwhitacre commented 7 years ago

Yeah, something like that. Ideally we can partner with Libraries.io and bring a bunch online at once.

chadwhitacre commented 7 years ago

¯\_(ツ)_/¯

gratipay::MAROON=> select count(*) from packages where readme_raw is not null;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)

gratipay::MAROON=> select count(*) from packages where readme_needs_to_be_processed;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

gratipay::MAROON=> select count(*) from packages;
┌────────┐
│ count  │
├────────┤
│ 345044 │
└────────┘
(1 row)

gratipay::MAROON=>
chadwhitacre commented 7 years ago

Alright! READMEs initially loaded! 💃

Let's get some pages on the site! #4151

chadwhitacre commented 7 years ago

Slight change of plans. #4151 is too much of a rabbit hole. We don't want to get ourselves into the business of processing and securing READMEs across 30+ package managers. Our existing /on/network/foo/ pages aren't very contentful, there's no reason /on/npm/foo/ need be, either. My current plan is to make a PR to remove the README processing that is already deployed (we still want package fetching and syncing, since emails are in there and that's our key for linking with users), and then move on to Checkpoint 2: Giving to Packages.

chadwhitacre commented 7 years ago

@kaguillera and I are talking IRL about how much tech debt we want to take on underneath Relax Open Work Requirement in order to fast-track the npm feature here. Over there, we are renaming Teams to projects and members to collaborators, and we are also now talking about removing tip migration. It's basically a question of whether we change names in the UI only, or also remove code and drop/rename database tables and such. The trade-off is that if we only make surface changes over there, then the next @JessaWitzel that comes along will have even more confusion to deal with ("Wait—projects are stored in the teams table? WTF!")—but I am going to be so incredibly mad if we're not first to market with npm pledging. :rage4:

@JessaWitzel Can we please please please go into debt here? I PROMISE I will fix it and make it all better in January or February. Or March. 🙏

chadwhitacre commented 7 years ago

Discussing in slack.

aandis commented 7 years ago

January or February. Or March.

😄

chadwhitacre commented 7 years ago

Ok. I will bless this technical debt with my fairy wand but request a deadline for fixing it. 12 weeks after feature launch

@JessaWitzel at slack

chadwhitacre commented 7 years ago

We have reached Checkpoint 1: Inert /on/npm/foo/ Pages! 💃

https://gratipay.com/on/npm/react-router/

screen shot 2016-12-04 at 7 13 48 pm

nobodxbodon commented 7 years ago

Anything I can do to accelarate this? It seems top priority for now.

Some questions:

chadwhitacre commented 7 years ago

Thanks for bumping this, @nobodxbodon! As mentioned in slack, I hoping to spec this out this week while also bringing Relax Open Work Requirement in for landing (this was blocked on that).

chadwhitacre commented 7 years ago

is NPM cool with this, and are they willing to partner/coordinate in any way?

I emailed them and didn't hear back. I think if we get some traction with this, that will be the time to reapproach a conversation with them.

any workload estimate and roadmap with timeline?

Last month I guesstimated (one, two) that the two projects together would take six weeks of calendar time. It's now been four and a half weeks and we're not done with either yet. We're likely to finish the first by the end of this week (Week 5), with next week being Week 1 on Integrate npm. It seems unlikely to take a week. ;-) I was off by a factor of 2.5 on the Relax Open Work time estimate (I figured two out of the six). That suggests 10 weeks of calendar time for Integrate npm.

Roadmap and further estimation tbd when I can spec this out.

who are main developers and any dividing of tasks?

Me and maybe @aandis? Anybody else wanna volunteer? :-) Division of tasks tbd.

any blockers?

https://github.com/orgs/gratipay/projects/5?fullscreen=true

chadwhitacre commented 7 years ago

I've put both projects on the calendar, with an initial target date of March 17 for Integrate npm.