Mass name squat by user: tcw

prof-milki commented 3 years ago

Project to be claimed

PROJECT_NAMES: fk, fg, dn, cw, bj, au, ge, ju, ef, ix, yw, if, hz, vd, nm, bh, wv, aw, ji, uy, hg, nw, og, ij, rh, ek, ke, an, yb, ir, ub, ov, nj, ho, oa, ej, hf, yx, oj, yz, ul, xv, mv, xl, ow, xa, ou, eu, zr, iy, el, hm, jo, yv, ys, yp, yl, yi, yf, ya, uw, uo, un, uj, ug, uf, tu, tn, su, rv, ps, nh, na, lv, lh, lb, ky, kw, kr, kc, jy, ba, ha, by

USER: https://pypi.org/user/tcw/

Reasons for the request

This looks like some fairly automated registrations. None of the packages are overly interesting on first glance (nor well documented for that matter). And the two-letter project names have little resemblance to the source repository or files they package.

Now this might be the result of some packaging script gone awry. But it does look a bit like very premature project name reservations. And there's obviously some names that might fit more useful projects (ps or hg at least).

Maintenance or replacement?

No transfer required.

Contact and additional research

No response so far.
https://github.com/thorwhalen/ut/issues

yeraydiazdiaz commented 3 years ago

@thorwhalen seems active in GitHub, we would really appreaciate it if he would take the time to remove these packages himself if it was an error.

ssbarnea commented 2 years ago

Is anyone ever closing ticket here? To me this looks like a good reason to block this user.

spam and/or squatting but clearly not normal usage.

@pypa/warehouse-admins

di commented 2 years ago

@thorwhalen, what's going on here? It seems like you're republishing existing libraries under these short project names? https://github.com/thorwhalen/uu

thorwhalen commented 2 years ago

First of all, I apologize: I noticed this thread only recently, when I noticed that some things weren’t working anymore because some of my names were taken away and looked into what happened. I now see I received email notifications of the thread, but they were buried in the plethora of github emails I get. I hope I didn’t inconvenience anyone two (pun not initially intended) much.

I didn’t answer right away because I immediately looked up similar issues, and seeing how subjective, and sometimes trolling, the whole thing could be, I was hesitant to step into that time consuming and possibly emotional pit. This is why there’s things like pep 541 — to try to give a basis to the madness, though those who want to fight still do so with interpretation .

That said, let me shyly give this a chance.

Hopefully a bit of declaration of intent can help shed on some light on the behavior and be sufficient to appease and resolve.

The effort stems from two major needs/wants. In a nutshell:

Making a big repository of utils I’ve accumulated more clean and reusable
Giving myself the tools to deal with the problem above more systematically

Over the years, when I spot something in a project that is not private or particular to the project, I try to extract the reusable functionality and put it up on public github. It makes it easier to reuse in other projects, collaborate with their teams, etc. The way I was doing this wasn’t sustainable — one humongous repository with hundreds of, sometimes totally unrelated functionalities in them, no packaging, no dependency management, etc.

I’ve been wanting to tackle this ugly problem for a while, and a year a half ago I had a bit of time to look into it. I wanted to break things up in such a way that future projects didn’t have to clone and install a mammoth, but rather just pip install only what was needed. I also am hoping that having things more broken up and themed will enable me (and who ever I’m working with at the time) to have a specific home for utils that will be developed in the future; thus allowing even the little modules I have to snowball into something more significant.

Know that quite a bit of manual work went into it, in case automation is the issue here. If it’s the “shouldn’t be too fast” part of automation that is the critique, know that I spent extra time, in fact, to automate some of the manual work I had already done. This is because I felt that this “project outgrows itself” problem is common enough (to me) that I’d like to achieve my concrete goal leaving some tooling crumbs on the way.

Again, this is not me targeting a bunch of names and slapping code into it. This is me taking a big corpus of code that existed already, and was already broken up in packages, sub packages, and modules, and make these into fine-grained pip installable packages.

thorwhalen commented 2 years ago

Let me, now, address some of the constructive critiques I see above.

This looks like some fairly automated registrations.

Yes. But it’s part of the point. Bulk process the big corpus and then bulk publish. It’s really how I’d like to do it, so that I can make improvements globally and consistently.

For example, in my todos are things like: automate install_requires, documentation, creation of keywords (to better search this mess!), get more naming and typing consistency, etc.

The way I’d like it to go is: Create the tools to process the code, apply globally, and re-publish changes.

Is there some “nice” parameter I should include in my publishing process (like wait 5s between each package)?

None of the packages are overly interesting on first glance

Hm. There's a lot of scope for subjective discussion here. I probably should avoid getting into that.

I'm guessing the "None" was an exaggeration, so I won’t take it as an insult to my intelligence or choices.

I will concede though that some of them are way too minimal and should probably be merged “out” into some other related package, or I should find some other relevant code to merge “in” to make the package more significant.

(nor well documented for that matter).

Yep. I agree with that. With the install_requires, it’s my top todo!

Right now, what I did was just take the module doc and copied it into the README.md. But please note that I had to go through more than 100 modules and write a short doc manually.

Also note that quite a few functions (the main ones that I use most) have docs and doctests in them. So it’s documented in that sense. What I intend to do is extract those to make a more useful readme and eventually (if the size and frequency of use of the package warrants it) some proper documentation site hosted in github.io.

And the two-letter project names have little resemblance to the source repository or files they package.

Yeah… that’s one I hesitated on for some time. Pypi suggests that packages names should be “short and memorable”. I prefer both, and genuinely tried to get better names, but I was going down a rabbit hole there (an interesting one involving semantic analysis etc., but won’t go into detail here). I favor short over descriptive and just couldn’t get (available) descriptive names short enough.

The saving grace for me was to realize three things:

A lot of the top packages have names that are only loosely connected with the subject matter. Ask someone who doesn’t know the package what pandas or six or black does.
When the package is used, the name isn’t a problem (again… pandas, six, black)
Branding 101: more your name is non-specific, more flexibility you have to follow the inevitable shift in the content, avoid the need of rebranding (in our case re-naming).

I saw that there were a lot of two-letter available packages, so that sealed the deal:

the abundant availability of two letter package names meant that they weren’t convoited names, so I shouldn’t be blocking too many folk from any “perfect name for their thing”
they’re as short as they can be

Please also note that I did spend some time mapping the names to the packages. I didn’t have to, but I still did. Now it’s FAR from perfect (I regret some choices now, but it’s a bit hard to change without being disruptive), but many DO have a link only known to me. Here are a few examples, from the pretty good, to pretty far fetched:

rv - Utils to work with randomness - R.V stands for Random Variable, a fundamental concept in statistics
cr - binary classification count results - C.R. = Count Results
un - Analyzing classes - “un” is the indefinite article in French (so as in… AN instance of…)
kw - Utils to work with storage - K for key and W for write, because in storage we often write something somewhere, referenced by a key

I just came up with another TODO: I should probably add these “name rationale” explanations in the READMEs.

thorwhalen commented 2 years ago

So, hopefully I’ve demonstrated that this is neither a shameful squatting of names nor a rogue experiment, and we won’t have to go into the weeds of opinions.

It is FAR from perfect, and I’m still trying to figure out how to achieve what I want to achieve, but I tried to leave it in a fairly clean state when I first published these, and have recently made some more improvements.

prof-milki commented 2 years ago

I should add the original comment was observational/speculative because package purposes weren't deducible from the absent documentation. And while the compartmentalization and convenience aspect explains the names, in quantity it looks gluttonous.

Terse package names are fine (though if seems unimportable). The recommendation for terseness comes with caveats though. Names should still be proportional to generability. Some of these packages are rather domain-specific. Hence should recognize that the pypi namespace is a bit more uniform than the heavily segmented Github allocation.

May I suggest you consider reshuffling the majority into subpackages? Significantly more work to set up, but the pip extras mechanism might be the way to go here:

pip install tcw[is,da,pk,gd]

h-a-d-o commented 10 months ago

I would like to join the complaint.

I don't want to accuse anyone of deliberately obstructing other developers. But when someone occupies ~12% of all 2-LatinLetter combinations, it is hard not to see provocation in such behavior.

Especially, since the packages are not well documented. And not maintained? (last update: 21/20. October 2022)

@thorwhalen Shouldn't be "uu" + subpackages be enough? (similar to what @prof-milki mentioned) (https://github.com/thorwhalen/uu)

Therefore, please revoke all claimed names. @thorwhalen had enough time to solve this issue. This ticked was opend on Jan 8, 2021!, 3 years ago

thorwhalen commented 9 months ago

Dear moderators,

Can we either close this issue, or figure out how to resolve it. I'll respond more at length as soon as I get a chance (this week hopefully). Namely, why the solution proposed above isn't really one for me (or at least, it would be a pain). I'd also like to get a sense of clear constraints (e.g. more README.md? individual github project? more regular updates?) under which to find a solution (and hopefully one that's not too time consuming or disrupted). I sought guidance in pypi's terms and conditions and found I wasn't in violation at all. I also poked around the existing pypi projects and found that the projects contested here are far from being under average in "qualtiy".

As for the comment by h-a-d-o, I implore you to cast it aside with the gentle indifference it deserves. The sole contribution of h-a-d-o to the GitHub cosmos appears to be the comment in question, as evidenced by their GitHub ghost town: https://github.com/h-a-d-o.

It seems to me to be a comment not born out of engagement or care, but rather the digital equivalent of shouting into the void. And so, with a slightly raised eyebrow and a hint of a smirk, we move on.

h-a-d-o commented 9 months ago

Lots of blablabla.

Ghost town like your mostly undocumented public repos? 💤

Just because there aren't any public repos doesn't mean much. ("Insider tip": in 2024 you are allowed to have several accounts on several git-hosting plattforms 🤣)

@thorwhalen Why don't you just setup a private PyPi Package Repo for your packages/modules? Gitlab offers this in a very convenient way.

thorwhalen commented 9 months ago

@prof-milki: The solution involving "optional features" is not well-suited to our needs for several reasons beyond the initial investment required to implement it effectively. The drawbacks of this method are widely recognized and I choose not to delve into them to avoid provoking further negative responses from anonymous online commentators. Our specific situation, often dealing with edge computing and microservices, necessitates a focus on minimizing resource use, dependencies, and installation times.

I have detailed my methodology in a previous discussion, available here. As mentioned there, I try to separate project-specific private aspects and public-publishable reusable code, and have always encouraged my team or others around me to do the same.

To enable this, it's important to have package names you can build on. I tried to only publish to pypi once the package was mature enough, but this has repeatedly led me to running into taken names (some that I had even checked on before choosing the project name).

When this happens, it can cost quite a lot to transition to a free name. The code imports isn't the biggest of problems: The difficult part is that all communication around the code (docs, emails, slack messages) use an old name. It's enough to discourage the publication effort.

It's at that point that I looked at what the rules of pypi were, and what the general "quality" of the packages out there were. I found that in fact, the bar was quite low, both in rules and in practice. Many packages are truly old squats. I now gathered and computed actual stats on this.

It became clear that I couldn't wait for a package to be completely ready to publish it, but also that the constraint was neither required, nor practiced, in general.

When it came subdividing the pile of utils I had accumulated and finding a namespace for them, it seemed like two-letter combos was a good choice -- including to reduce the likelihood that I'd be hindering others with an excellent name for a high quality package.

Although the only significant feedback I got since I explain my rationale, has been the unwelcome attention of internet trolls, with no other major complaints, I request the closure of this issue to prevent further hostility. This unnecessary aggression has already resulted in considerable wasted time and energy.

thorwhalen commented 9 months ago

tldr;

62.49% of pypi packages' last update is older than the oldest uu package (which is 468 days)
- ...and median age of pypi package is almost double that of uu
32.67% of the random 10K packages have a size smaller than the smallest uu package
uu package's key metadata fields are consistently more informed than the average pypi package

(Stats computed on a random sample of 10K pypi packages.)

Details

The time I wasted recently on this was not spent to update the "uu" packages in question, but to try to determine a bit more scientifically what my target "quality" should be, but computing and comparing the hard numbers. Of course, the trolls out there will still complain and point out exactly those stats that I've missed, but no one can say I haven't thoroughly looking into it and responded (@di -- perhaps you can remove the "awaiting response" tag at this point?).

My first attempt was to write a script that would actually install the packages and carry out some operations on them. I wanted to have more precise information than just the json meta data that can be found with https://pypi.org/pypi/{pkg}/json. But realized that this was not only resource intensive, but a bit dangerous.

Still, I'll convey the one stat I've computed there that cannot (as far as I can see) be derived from the json metadata:

Does it install?

These stats compare the 78 "uu" packages with a random sample of 100 packages from (the 500K+) packages of pypi:

        uu installable: 100.0% (78 out of 78)
random pkgs installable: 34.0% (34 out of 100)

And now the stats I computed from a random sample of ~10K package names taken from https://pypi.org/simple/. Not that 31 of these didn't have any json (so are names that are "taken", but not usable. Still, I've used 9969 as the count when computed percentages.

How well is the package "meta-dataed"

What percentage of some key metadatas are present?

Attribute	uu_pkgs	random_10K_pkgs
version	100.00	99.99
summary	100.00	94.05
home_page	98.72	76.17
project_url	100.00	99.99
license	100.00	64.75
description	100.00	80.43
size	100.00	98.18
upload_time_iso_8601	100.00	98.18

Is there anything in the package?

As a proxy to this, I looked at the package size of the current version's release (taking the sdist, or the bdist_wheel if sdist not present).

Description	Size (bytes)
uu median package size	10656
random 10K median package size	9631

"worst" of the uu packages to the rest of the world:

32.67% of the random 10K packages have a size smaller than the smallest uu package (which has 4848 bytes)

How old is the last update?

Description	Median Last Update Age (days)
uu median last update age	468
random 10K median last update age	834

"worst" of the uu packages to the rest of the world:

62.49% of the random 10K packages are older than the oldest uu package (which is 468 days)

How well is it documented

Json metadata proxies: README.md (a.k.a. "description") and short summary.

Description	uu	random 10K
Percentage of empty summaries	0.00%	5.95%
Percentage of empty descriptions	0.00%	19.57%
Median description size (characters)	68	822

Does it have a license?

Description	uu	random 10K
Percentage of packages with license	100.00%	98.77%

thorwhalen commented 9 months ago

@prof-milki: I hope it is clear from the above that the uu packages' metrics are in fact consistently above average. The only one that it fails at is the README.md size, which I'll work on (already started, but didn't want to inflect the stats calculation). It seems the only valid reason this issue even exists is that their are many of them, with strange two-letter names that makes the whole thing stick out. I've explained why that is, so I hope this is sufficient information for you to close this issue.

pypi / support