Closed yusuf1759 closed 2 months ago
Thanks for authoring this. I think there are still a few remaining issues:
- Could you remove the parallel
api.md
?- There are some occurences where 'PLINDER' is not written upper case, although it is not referencing the Python package
- Some markdown formatting is missing: For example some paths are currently not rendered in monospace and the note does not use
:::{note}
directive.- One cell is failing
- The
Overview
section is empty. I think it would be good when the user is introduced what the 'idea' behind the public API is, for example how it is split into subpackahes
@padix-key which of the cells is failing? they seem to all the passing on my end.
The error appears at this code cell:
from plinder.core import get_plindex
annotation_df = get_plindex()
annotation_df.head()
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[3], line 2
1 from plinder.core import get_plindex
----> 2 annotation_df = get_plindex()
3 annotation_df.head()
File ~/Documents/coding/plinder/src/plinder/core/utils/dec.py:23, in timeit.<locals>.wrapped(*args, **kwargs)
21 result = None
22 try:
---> 23 result = func(*args, **kwargs)
24 log.info(f"runtime succeeded: {time() - ts:.2f}s")
25 except Exception:
File ~/Documents/coding/plinder/src/plinder/core/index/utils.py:48, in get_plindex(cfg)
46 cfg = cfg or get_config()
47 suffix = f"{cfg.data.index}/{cfg.data.index_file}"
---> 48 index = cpl.get_plinder_path(rel=suffix)
49 LOG.info(f"reading {index}")
50 _PLINDEX = pd.read_parquet(index)
File ~/Documents/coding/plinder/src/plinder/core/utils/cpl.py:130, in get_plinder_path(rel, download)
128 cfg = get_config()
129 root = _get_fsroot(cfg)
--> 130 client = GSClient(local_cache_dir=root)
131 remote = cfg.data.plinder_remote
132 if rel:
File ~/conda/envs/plinder/lib/python3.10/site-packages/cloudpathlib/gs/gsclient.py:101, in GSClient.__init__(self, application_credentials, credentials, project, storage_client, file_cache_mode, local_cache_dir, content_type_method, download_chunks_concurrently_kwargs)
99 else:
100 try:
--> 101 self.client = StorageClient()
102 except DefaultCredentialsError:
103 self.client = StorageClient.create_anonymous_client()
File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/storage/client.py:227, in Client.__init__(self, project, credentials, _http, client_info, client_options, use_auth_w_custom_endpoint, extra_headers)
224 no_project = True
225 project = "<none>"
--> 227 super(Client, self).__init__(
228 project=project,
229 credentials=credentials,
230 client_options=client_options,
231 _http=_http,
232 )
234 # Validate that the universe domain of the credentials matches the
235 # universe domain of the client.
236 if self._credentials.universe_domain != self.universe_domain:
File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/client/__init__.py:320, in ClientWithProject.__init__(self, project, credentials, client_options, _http)
319 def __init__(self, project=None, credentials=None, client_options=None, _http=None):
--> 320 _ClientProjectMixin.__init__(self, project=project, credentials=credentials)
321 Client.__init__(
322 self, credentials=credentials, client_options=client_options, _http=_http
323 )
File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/client/__init__.py:271, in _ClientProjectMixin.__init__(self, project, credentials)
268 project = self._determine_default(project)
270 if project is None:
--> 271 raise EnvironmentError(
272 "Project was not passed and could not be "
273 "determined from the environment."
274 )
276 if isinstance(project, bytes):
277 project = project.decode("utf-8")
OSError: Project was not passed and could not be determined from the environment.
For me the error appears in both cases, when I execute the notebook directly and when I execute it via sphinx-build
.
The error appears at this code cell:
from plinder.core import get_plindex annotation_df = get_plindex() annotation_df.head()
--------------------------------------------------------------------------- OSError Traceback (most recent call last) Cell In[3], line 2 1 from plinder.core import get_plindex ----> 2 annotation_df = get_plindex() 3 annotation_df.head() File ~/Documents/coding/plinder/src/plinder/core/utils/dec.py:23, in timeit.<locals>.wrapped(*args, **kwargs) 21 result = None 22 try: ---> 23 result = func(*args, **kwargs) 24 log.info(f"runtime succeeded: {time() - ts:.2f}s") 25 except Exception: File ~/Documents/coding/plinder/src/plinder/core/index/utils.py:48, in get_plindex(cfg) 46 cfg = cfg or get_config() 47 suffix = f"{cfg.data.index}/{cfg.data.index_file}" ---> 48 index = cpl.get_plinder_path(rel=suffix) 49 LOG.info(f"reading {index}") 50 _PLINDEX = pd.read_parquet(index) File ~/Documents/coding/plinder/src/plinder/core/utils/cpl.py:130, in get_plinder_path(rel, download) 128 cfg = get_config() 129 root = _get_fsroot(cfg) --> 130 client = GSClient(local_cache_dir=root) 131 remote = cfg.data.plinder_remote 132 if rel: File ~/conda/envs/plinder/lib/python3.10/site-packages/cloudpathlib/gs/gsclient.py:101, in GSClient.__init__(self, application_credentials, credentials, project, storage_client, file_cache_mode, local_cache_dir, content_type_method, download_chunks_concurrently_kwargs) 99 else: 100 try: --> 101 self.client = StorageClient() 102 except DefaultCredentialsError: 103 self.client = StorageClient.create_anonymous_client() File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/storage/client.py:227, in Client.__init__(self, project, credentials, _http, client_info, client_options, use_auth_w_custom_endpoint, extra_headers) 224 no_project = True 225 project = "<none>" --> 227 super(Client, self).__init__( 228 project=project, 229 credentials=credentials, 230 client_options=client_options, 231 _http=_http, 232 ) 234 # Validate that the universe domain of the credentials matches the 235 # universe domain of the client. 236 if self._credentials.universe_domain != self.universe_domain: File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/client/__init__.py:320, in ClientWithProject.__init__(self, project, credentials, client_options, _http) 319 def __init__(self, project=None, credentials=None, client_options=None, _http=None): --> 320 _ClientProjectMixin.__init__(self, project=project, credentials=credentials) 321 Client.__init__( 322 self, credentials=credentials, client_options=client_options, _http=_http 323 ) File ~/conda/envs/plinder/lib/python3.10/site-packages/google/cloud/client/__init__.py:271, in _ClientProjectMixin.__init__(self, project, credentials) 268 project = self._determine_default(project) 270 if project is None: --> 271 raise EnvironmentError( 272 "Project was not passed and could not be " 273 "determined from the environment." 274 ) 276 if isinstance(project, bytes): 277 project = project.decode("utf-8") OSError: Project was not passed and could not be determined from the environment.
For me the error appears in both cases, when I execute the notebook directly and when I execute it via
sphinx-build
.
This looks like cloud credential issue. @tjduigna should be able to help. Ideally this shouldn't be happening since the bucket is public. If you run gcloud config set project vantai-analysis
it should be fine, but external users shouldn't have to do that.
This looks like cloud credential issue. @tjduigna should be able to help. Ideally this shouldn't be happening since the bucket is public. If you run
gcloud config set project vantai-analysis
it should be fine, but external users shouldn't have to do that.
Adding a dummy project os.environ["GCLOUD_PROJECT"] = "my-project"
seem to fix the issue.
This cell now works for me. However, now another cell fails:
from plinder.core.scores import query_links
query_links()
---------------------------------------------------------------------------
IOException Traceback (most recent call last)
Cell In[13], line 2
1 from plinder.core.scores import query_links
----> 2 query_links()
File ~/Documents/coding/plinder/src/plinder/core/utils/dec.py:23, in timeit.<locals>.wrapped(*args, **kwargs)
21 result = None
22 try:
---> 23 result = func(*args, **kwargs)
24 log.info(f"runtime succeeded: {time() - ts:.2f}s")
25 except Exception:
File ~/Documents/coding/plinder/src/plinder/core/scores/links.py:52, in query_links(columns, filters)
43 query = make_query(
44 dataset=dataset,
45 filters=filters,
(...)
49 include_filename=True,
50 )
51 assert query is not None
---> 52 df = sql(query).to_df()
53 df["kind"] = df["filename"].apply(lambda x: Path(x).stem.split("_links")[0])
54 return df
File ~/conda/envs/plinder/lib/python3.10/site-packages/duckdb/__init__.py:457, in sql(query, **kwargs)
455 else:
456 conn = duckdb.connect(":default:")
--> 457 return conn.sql(query, **kwargs)
IOException: IO Error: No files found that match the pattern "/Users/kunzmann/.local/share/plinder/2024-04/tutorial/links/*.parquet"
Now the notebooks works :+1:
Could you in a final section also cover the data loader?
Could you in a final section also cover the data loader?
We are holding this off for now.
I reformatted some parts of the tutorial in my latest commit. I think two section could be more descriptive:
PlinderSystem
is introduced, but these user gets no information on what can be done with it.query_links()
doesNote that you can reference functions and classes with the Sphinx roles with {func}`some_func()`
and {class}`SomeClass`
respectively. In the rendered docs, these will become helpful links then, that point to the respective page in the API reference.
In addition I found some headings which where simply rendered as bold with *<some heading>*
. If instead a Markdown heading (i.e. one or multiple #
, depending on hierarchy) is used the output is rendered more nicely and the section appears on the sidebar.
I reformatted some parts of the tutorial in my latest commit. I think two section could be more descriptive:
- The
PlinderSystem
is introduced, but these user gets no information on what can be done with it.- I think it is not getting clear enough what the table returned by
query_links()
doesNote that you can reference functions and classes with the Sphinx roles with
{func}`some_func()`
and{class}`SomeClass`
respectively. In the rendered docs, these will become helpful links then, that point to the respective page in the API reference. In addition I found some headings which where simply rendered as bold with*<some heading>*
. If instead a Markdown heading (i.e. one or multiple#
, depending on hierarchy) is used the output is rendered more nicely and the section appears on the sidebar.
Updated to reflect this changes.
The section about PlinderSystem
mentions a System
and an Entry
class. However, these are not part of the public API, or will the become part?
And load_systems()
is neither a part, at least currently. Should we include this function in the public API?
I pushed a clean-up commit. From my side only the questions regarding the API exposed the user needs be decided before merge.
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
src/plinder/core/utils
cpl.py
Project Total
This report was generated by python-coverage-comment-action
And
load_systems()
is neither a part, at least currently. Should we include this function in the public API?
I removed this.
I pushed a further cleaned up commit, addressing all the issues highlighted here. @Ninjani @padix-key Let me know if I missed anything.
This PR converts api.md to api.ipynb and adds more context to the tutorials.