twisted / pydoctor

This is pydoctor, an API documentation generator that works by static analysis.
https://pydoctor.readthedocs.io
Other
188 stars 50 forks source link

Improve the `--privacy` matches #618

Open tristanlatr opened 2 years ago

tristanlatr commented 2 years ago

While trying to make the twisted API docs work with pydoctor 22.7, I realized that the current pattern matching powering the --privacy option is hard to use for Twisted.

Indeed, it currently needs the 23 extra test packages to be marked hidden explicitly because we can't just hide **.test, this would maybe hide some crucial API elements. So I went with the following way:

--privacy="HIDDEN:twisted.words.test" \
    --privacy="HIDDEN:twisted.web.test" \
    --privacy="HIDDEN:twisted.speard.test" \
    --privacy="HIDDEN:twisted.scripts.test" \
    --privacy="HIDDEN:twisted.runner.test" \
    --privacy="HIDDEN:twisted.python.test" \
 etc...

I've taken the list of test packages from this search query: "+qname:*.test +kind:Package". From these Twisted docs: https://pydocbrowser.github.io/twisted/latest/

It would be the best if one could do --privacy="HIDDEN:search:+qname:*.test +kind:Package" which would best mimics what happens here in the Twisted customization code: https://github.com/twisted/twisted/blob/b7b044fe4758c7baede80394df6173c3071e1e4a/src/twisted/python/_pydoctor.py#L200-L201

adiroiban commented 2 years ago

What do you say about the support regex?

Regex syntax is ugly but should be powerful enough to do the job done, and regex is pretty common and well understood.

tristanlatr commented 2 years ago

Could you elaborate why regex will solve our issue here ? I don’t think it will…

It should mimic to following code:

if isinstance(current, model.Package) and current.name == "test":
                return model.PrivacyClass.HIDDEN
adiroiban commented 2 years ago

because we can't just hide **.test, this would maybe hide some crucial API elements. So I went with the following way:

So if **.test doens't work. My hope was that with regex we can improve the accurancy of patter :)

So now, we have <PATTERN> is fnmatch-like pattern matching objects fullName.

With regex, I was happing that we can have something like this

`--privacy="HIDDEN:twisted.[a-b_].test..+"

tristanlatr commented 2 years ago

I see, but still we'll have to list them all, so for instance:

--privacy="HIDDEN:re:twisted\.(words|web|runner|python|etc)\.test"

Whereas the lunr search language that understand objects kind.

Both ideas are worth exploring I guess.

adiroiban commented 2 years ago

Regarding the explicit names

--privacy="HIDDEN:re:twisted.(words|web|runner|python|etc).test"

Which fullName contains .test. in it and it should not be hidden?

Yes. We might need to have 2 rules, but I guess it should be ok.

--privacy="HIDDEN:re:^twisted\.[a-b]+\.test\..+"
--privacy="HIDDEN:re:^twisted\.test.\.+"

For twisted, top packages are

application
conch
cred
enterprise
internet
logger
mail
names
newsfragments
pair
persisted
plugins
positioning
protocols
python
runner
scripts
spread
tap
test
_threads
trial
web
words

so it looks like.

I am not agains lunr syntax.

Just a comment that maybe regex can do the job and is already in stdlib and the syntax should be well understoond :)

tristanlatr commented 2 years ago

Which fullName contains .test. in it and it should not be hidden?

If you go there: https://pydocbrowser.github.io/twisted/latest/ and search for "qname:*.test" (which is the same as **.test in qnmatch syntax), you'll see a lot of API elements, not just the test packages :

by default 2022-07-28 at 11 16 16 AM

You might argue that these objects are all under a test package so there are all hidden already. But I think it's a happy coincidence. It would be unsafe to rely on the fact that Twisted is not adding a test() method that is part of any public API ever.

So we'll end up with listing all names of the test packages. And if there is a new module in twisted with a test package, it won't be hidden automatically.

Also your two rules up there don't do the job, because it's the test packages that we must hide, not elements under it (except for twisted.test):

So we'll end up with something like this:

--privacy="HIDDEN:re:^twisted\.(words|web|runner|python|etc|23times)\.test$"
--privacy="HIDDEN:twisted.test.*"
--privacy="PUBLIC:twisted.test.proto_helpers"

VS with lunr search:

--privacy="HIDDEN:search:+qname:twisted.*.test +kind:Package"
--privacy="HIDDEN:twisted.test.*"
--privacy="PUBLIC:twisted.test.proto_helpers"

Honestly, I don't have very strong opinions. lunr search is more sexy, but it's a bit more harder to implement and less know that regex.

Tell me what you think

tristanlatr commented 2 years ago

From an outside perspective, @glyph, do you think it's even worth it to implement such changes to avoid 23 individual lines in Twisted's setup.cfg?

glyph commented 2 years ago

From an outside perspective, @glyph, do you think it's even worth it to implement such changes to avoid 23 individual lines in Twisted's setup.cfg?

I'd rather add this functionality because otherwise we have to have a linter to ensure that those lines are up to date as we add or remove "test" packages, i.e., implement this over again. However, adding whole new packages is infrequent enough that I wouldn't consider this super urgent if it needs more thought.

glyph commented 2 years ago

I wouldn't mind having --privacy just like… name a predicate function that gets called with a module name or something though, maybe that would let us avoid the syntax debate?

tristanlatr commented 2 years ago

name a predicate function that gets called with a module name or something though

The object name is not enough to be on the safe side, we'd have to use Documentable instances directly (to check if it's a Package), but if we're doing that, it's just like having a custom system that overrides the System.privacyClass(). We can still bring back the old code that handled that in Twisted repo.

But I don't think we should add new code customization entry points like this, we already have --system-class that can almost tweak every aspect of the building process (but not rendering).

I'd rather add this functionality because otherwise we have to have a linter to ensure that those lines are up to date as we add or remove "test" packages, i.e., implement this over again.

Sounds like you want something that understands the concept of what's a package and what's not. Regex doesn't have that since it operates on the object name only. So the lunr search language would solve that issue, since it operates on the name and the kind. We'd still need a checker with regex to ensure that we include all test packages.

tristanlatr commented 2 years ago

Sounds like you want something that understands the concept of what's a package and what's not.

We can also rely on the fact that modules should be snake_cased where classes should be CamelCased. So I guess it could work with regex.

glyph commented 2 years ago

Sounds like you want something that understands the concept of what's a package and what's not.

We can also rely on the fact that modules should be snake_cased where classes should be CamelCased. So I guess it could work with regex.

Let's be careful with that. There are exceptions to this, and it might be really confusing when it breaks.