pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
453 stars 141 forks source link

regex_release_file_metadata not filtering packages #615

Open lps-rocks opened 4 years ago

lps-rocks commented 4 years ago

This is my current bandersnatch filtering configuration:

'regex_project_metadata' appears to be working properly. It's not allowing any of the listed projects to be downloaded as expected.

However, 'regex_release_file_metadata' is not functioning properly. I'm still finding packages (even after a fresh delete/sync) containing the regex strings listed...

I tried going through the source code and the documentation but it's not clear how these options work... There's also no debug lines present in any of the filtering logic so there's no way to easily debug this.


[plugins]
enabled =
  regex_release_file_metadata
  regex_project_metadata

[regex_project_metadata]
none:match-null:info.name =
  ^tf
  ^mxnet
  ^tensorflow
  ^cupy
  \-nightly$
  ^lalsuite
  ^cntk
  ^catboost
  ^openvisus
  ^paddlepaddle
  ^torch
  ^grpcio
  ^codeintel
  ^CodeIntel
  ^opencv
  ^fiona
  ^sickrage

[regex_release_file_metadata]
none:match-null:release_file.filename = 
  .*macosx_.*
  .*macosx-.*
  .*\.freebsd.*
  .*-freebsd.*
lps-rocks commented 4 years ago

I see why this is or is not working and isn't very clear. The code is using re.match which starts at the beginning of a string. I can't remember if I had the .* at the beginning. Realistically this should probably be changed to re.search so that it behaves more like some of the other plugins.

lps-rocks commented 4 years ago

Created #616 to utilize re.search instead of re.match to make it so all patterns don't have to be created to deal with start of string anchoring that re.match uses.

lps-rocks commented 4 years ago

Considering the discussion on the PR #616 - This bug should be to update the documentation to let users know of the implicit beginning anchor (^) for any regex filters.

cooperlees commented 4 years ago

As stated tho, with the addition to bandersnatch configuration to toggle between re.search and default to re.match the PR will be accepted ... It too would need a documentation update :)