Closed justindujardin closed 3 years ago
Merging #48 (7dd0be4) into develop (d6ad724) will increase coverage by
0.27%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## develop #48 +/- ##
===========================================
+ Coverage 93.52% 93.80% +0.27%
===========================================
Files 12 12
Lines 1777 1856 +79
===========================================
+ Hits 1662 1741 +79
Misses 115 115
Impacted Files | Coverage Δ | |
---|---|---|
pathy/__init__.py | 100.00% <ø> (ø) |
|
pathy/base.py | 91.61% <100.00%> (+0.36%) |
:arrow_up: |
pathy/cli.py | 91.30% <100.00%> (+1.43%) |
:arrow_up: |
pathy/file.py | 88.38% <100.00%> (ø) |
|
pathy/tests/test_base.py | 98.39% <100.00%> (+0.03%) |
:arrow_up: |
pathy/tests/test_cli.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d6ad724...7dd0be4. Read the comment docs.
:tada: This PR is included in version 0.3.6 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Problem
When listing a large number of blobs and their changed time / size attributes, performance is not great. This is because when dealing with remote systems like GCS, the
stat
operation is slow when executed a bunch of times. Consider the following code that is fast at enumerating large numbers of files/stats on local systems but slow for remote GCS buckets.The trouble is that we have to make requests to get the file listings, and then extra requests for each blob to get the stat information. If you have hundreds or thousands of blobs this gets quite slow.
Solution
Add a helper method
ls
that does not exist in the standard pathlib.Path interface. This yields blobs and their size/time stats with a single-pass, and ends up being much quicker than the above example when dealing with remote storage. The example above would now be more appropriately implemented as:Changes
Add an
ls
method toPathy
objects. To provide a consistent API,Pathy.fluid
now returns apathy.BasePath
object when dealing with local files. This class is a light wrapper on top ofpathlib.Path
that adds the pathy specific methods likels
.Add a long-form
-l
flag to the cli'sls
command that prints blob size and updated time stats next to their names.