leereilly / swot

:school: Identify email addresses or domains names that belong to colleges or universities. Help automate the process of approving or rejecting academic discounts.
MIT License
1.13k stars 23.6k forks source link

Split data from library #247

Open afeld opened 10 years ago

afeld commented 10 years ago

There are enough true forks now (and just learned about another) that we should consider separating the data files from the Ruby gem, so that it can be used across all of them... presumably as a submodule.

/cc @zweizwei @mdwheele @theotow

theotow commented 10 years ago

sure, why not. Just start it, i will updated my repo accordingly

mdwheele commented 10 years ago

Sounds good to me! :smile:

Bup3 commented 10 years ago

That would be great! :+1: Then every developer can use their own system. (Python, Javascript (via Ajax maybe), ...)

afeld commented 10 years ago

Ok, these PRs are getting out of hand... there are currently 228 open :weary: I created the @swot-edu org to get this ball rolling. @leereilly, mind transferring this repository? After that, I'll rename it to https://github.com/swot-edu/data, and https://github.com/swot-edu/swot.rb can be used for the gem. We can add the aforementioned folks to a Team with permissions to merge, so that we can distribute the work of combing through them.

PR on the new gem repo to remove the data from the repository here: https://github.com/swot-edu/swot.rb/pull/1. :steam_locomotive:

leereilly commented 10 years ago

Nice work on pushing this forward, @afeld :metal:

I think we're all onboard with splitting the data out and moving it elsewhere (and perhaps expanding upon via something like #38), but I'm not sure if swot-edu is the right place. Alternative options: (a) continue with moving it under glean, or (b) ask someone nicely to look into recovering swot and move it there :wink:

I would prefer to keep the Ruby gem under leereilly.

I'll take a look at (wow!) all of these PRs over the next few weeks.

afeld commented 10 years ago

Is your hesitation with @swot-edu just the name? If the owner of @swot is willing to delete the account, we can always rename the org. Happy to rename it now if there's another name you prefer – that was just the first thing that came into my head.

My thinking with the dedicated org was to have the wrappers in different languages live alongside the dataset. Totally fine with converting it to Glean, but IMHO having the dataset grouped with the wrappers (under an org) makes more sense than grouping it by the format the data is provided in. Your gem, though, so your call.

FYI, all but one or two of the other PRs are for adding/correcting schools. Not really scalable, though that is a separate problem.

afeld commented 9 years ago

Bump!

afeld commented 9 years ago

Also, ok to discuss a conversion to glean separately? I'd want to discuss it a bit more first, but don't want to conflate these issues:

tmcw commented 9 years ago

@afeld @leereilly up to 423 pull requests in the queue: can either of you either add more maintainers so that we can start accepting them, or add me as an admin to the swot-edu project so that I can jfdi this? I would really like to support the community-run swot definitions instead of forking.

afeld commented 9 years ago

I would love to add you as a collaborator, but don't have permissions since I'm not the owner, unfortunately. Reached out to Lee via email as well to figure out the best way to get us unstuck, and will post any updates here.

leereilly commented 9 years ago

Hi everyone,

Quick update...

tmcw commented 9 years ago

Our current target is to merge all mergable requests - 45 down today, 380 to go - and then rethink data organization.

tmcw commented 9 years ago

@leereilly can you also add @zweizwei as a contributor? It looks like JetBrains had the same decision Mapbox did, and ended up forking and merging up as well. Since this is ideally a canonical database, there should be one current version and one place to submit pull requests - many of which are submitted to the jetbrains fork now

afeld commented 9 years ago

Maybe we can get all those open PRs on the JetBrains fork merged, then merge it back into this one?

tmcw commented 9 years ago

Uncoincidentally I wrote https://github.com/tmcw/clone-pull-requests yesterday which would let us auto-resubmit them onto this repo if we wanted to

afeld commented 9 years ago

Haha of course you did :smile: @jetbrains Your call! More collaborator horsepower on this repo, but if you'd like to add us to that repo, we can merge there instead. Wondering how many are duplicates...

leereilly commented 9 years ago

can you also add @zweizwei as a contributor?

Done :heavy_check_mark:

I like @afeld's idea of splitting the data from the various implementations. If you're all onboard, let me know and I can add JetBrains/Mapbox folks to the @swot organization and maybe we can use that as a canonical DB?

tmcw commented 9 years ago

I definitely agree with the idea of splitting the data, but think it should happen after all PRs are merged or triaged, and there's lots of work to do until we get to that point.

afeld commented 9 years ago

I don't think the one thing needs to be dependent on the other. Can we just transfer this repo to the org and make it the data repo? Already started a repo for the gem in https://github.com/swot/swot.rb. See https://github.com/swot/swot.rb/pull/1 for the actual split.

afeld commented 9 years ago

@leereilly Bump on doing the repo transfer to @swot, if you're ok with it!

mdwheele commented 9 years ago

Any updates on splitting data out? I see there is a data-only branch here with some stuff in it as well as the stuff over on the swot org. I've got some free time planned to update my fork and wanted to poke and see if there was progress. If not, I'll just continue syncing from here (master) as it looks most up-to-date.

Looks like number of PRs is lower than it was, so it may be a good time to align the stars and pull this off! :pray:

kiler129 commented 9 years ago

Using submodule is rather bad idea for something which is not code but constantly updating database. It will be better to use package managers available I think for most programming languages today. In example in PHP world most popular one is Composer (with Packagist repository).

I separated database from the code & enhanced database format retaining compatibility with current forma creating SwotNG - https://github.com/kiler129/SwotNG-database. The idea is to have separate project with database, separate one for tools and separate projects with independent implementations (like that Ruby one).

tmcw commented 9 years ago

To reiterate my comment above, I strongly believe that swot should be transferred to an organization and maintainership should be opened up before any technical changes happen. The pressing issue right now is the overflow of PRs.

mdwheele commented 9 years ago

Yeahhhhh, I'm going to hold off updating the PHP port structurally until this issue is resolved. I'll do a manual sync from here instead. Probably a bad idea to send PRs to yet another source of this data when https://github.com/swot/swot-data/tree/data-only exists already (albeit hasn't taken off). That said, if there are no collaborators that have time to fish through the 94 (current) PRs, validate them and merge, then it's kind of stuck. I don't think it'd be too tall of an order to do 5-10 a day and start chunking down the problem. That is, I'm saying that I (personally) would be willing to help out if with nothing other than validating and pinging each PR with a comment + proof IF there will be someone on-board to :+1: and merge away!