Closed khoroshevskyi closed 1 year ago
In my oppinion, some of this methods are usefull and and were written by @nleroy917 request.
So are you saying you'd prefer not to delete them, though you're doing that in this PR? Which ones specifically do you want to keep?
Is there any way for you to maintain the functionality while still simplifying the code, by reducing redundancy? In other words, can you re-use some code to keep these?
For example, get_projects_annotation_by_namespace
and get_projects_annotation_by_namespace_tag
should probably be merged into 1 function, with a tag parameter defaulting to None
For example,
get_projects_annotation_by_namespace
andget_projects_annotation_by_namespace_tag
should probably be merged into 1 function, with a tag parameter defaulting toNone
As far, as we are not using this function, I think we can delete them.
I changed my mind about one function, I decided to leave get_namespace_annotation
, as this method is usefull for pephub (or will be)
I think now ready to be merged.
I'm fine with keeping functions that you think are useful, even if they are not currently being used by pephub -- we may use this outside of pephub (the point of making it modular), so useful functions should be retained...
but just don't make those useful functions redundant, and be judicious about what you define as useful. does that make sense?
are there any others you think are worth retaining?
I don't think any other important function left. If we will need them in the future, we can rewrite them and fitting to specific pephub needs.
this branch is several commits behind the master branch, can you pull those in?
this branch is several commits behind the master branch, can you pull those in?
done
can you clarify the parameters in the constructor of the Connection object?
eg, what is dsn
? https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L35
maybe write a docstring there
don't mix camel case and underscores: https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L45
maybe use pg_connection
?
What will upload_project
do with a duplicate project?
Write in the docstring here: https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L75
don't mix camel case and undercores:
maybe use
pg_connection
?
Also you can completely get rid of using dsn and just ask to provide port, host, login, password, database and create the connection string inside the class so that user does not need to worry about, like that:
class Connection:
def __init__(self, host, port, login, password, database):
self.connection_string = f"postgresql://{login}:{password}@{host}:{port}/{database}"
Can you clarify this docstring? https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L96
Do you mean: "If true, the project will be updated"
After this we should change pephub, as it uses dsn only ... @nleroy917 ?
Can you clarify this docstring?
Do you mean: "If true, the project will be updated"
It means, if there is project with the same key, then that project will be updated
Can you clarify why there is update_project
and upload_project
, when upload_project
has an update=True
flag?
Also they have the same docstring. This is a confusing way to structure this.
What will
upload_project
do with a duplicate project?Write in the docstring here:
Duplicated projects won't be uploaded. But if update is set true, then that project will be updated
Simplify this docstring: https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L326-L327
specify what this function does do (not just what it doesn't do): https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L380 -- what eaxctly is it getting?
don't mix camel case and undercores: https://github.com/pepkit/pepdbagent/blob/b1cfe46f66a97d3415e6452be23736909742dcdb/pepdbagent/pepdbagent.py#L45
maybe use
pg_connection
?Also you can completely get rid of using dsn and just ask to provide port, host, login, password, database and create the connection string inside the class so that user does not need to worry about, like that:
class Connection: def __init__(self, host, port, login, password, database): self.connection_string = f"postgresql://{login}:{password}@{host}:{port}/{database}"
I think it's fine to provide this as an alternative connection method, but it needs to be clear in the docstring
Can you clarify why there is
update_project
andupload_project
, whenupload_project
has anupdate=True
flag?Also they have the same docstring. This is a confusing way to structure this.
update_project
is responsible only for updating projects.
upload_project
is responsible for uploading project. If update=True
argument is set then: if the project, that user wants to upload already exists, upload_project
will use update_project
and project with the same registry path will be updated.
RE: dsn
or connection string, I think sometimes it's advantageous to specify the driver SQLAlchemy uses underneath through the directive/scheme.
Im fine removing it. I believe pephub
just uses the host/user/pass method.
Looking good, just a few more comments:
Can you please define "Annotations" maybe in this docstring: https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepannot.py#L24
I don't really understand what this class is for and I can't figure it out from the code
Rename to just 'commit' or 'commit_to_database'? https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L75
Is the user supposed to use upload_project(update=True)
or update_project
? Instead of making 2 user-facing ways to do the same thing, maybe change update_project
to _update_project
to show it's an internal function?
Maybe rename to get_project_by_registry_path
? https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L264
similarly, here: https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L510 and here: https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L606
I don't think you need the intermediate variables here, just use the indexed array directly: https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L624-L628
can you update this for digest creation: https://github.com/pepkit/pepdbagent/blob/97baff4ad380cbd5d996e77d050c0d22ce6f7996/pepdbagent/pepdbagent.py#L682
to use this:
json.dumps(
data,
separators=(',', ':'),
ensure_ascii=False,
allow_nan=False,
sort_keys=True,
)
I believe, that I fixed all issues that you mentioned.
Two comments were not addressed:
upload project
(update=True) an update_project
have to be in pepdbagent. Why? Because there can be 3 update cases:
a. project doesn't exist in database, and user wants to upload it, and if exists we want to update it. (use upload_project)
b. project doesn't exist in database, and user wants to update it, only if it exists. (use update_project)
c. upload project if it doesn't exist (use upload_project)The a. option is very helpful for constant update of the projects. e.g. using geofinder and geofetch.
I don't understand what's the return of get_namespaces_info_by_list
. Is it supposed to return list of all namespaces? If so, why you must pass a list? Why this list can be none?
I don't understand what's the return of
get_namespaces_info_by_list
. Is it supposed to return list of all namespaces? If so, why you must pass a list? Why this list can be none?
This method is returning information about namespaces in the list. If the list is None, then it will return list of informations about all namespaces ([namespace1_info, namespace2_info,...]) In my oppinion it should be separated into two methods. what do you think?
Imo either the docstring should be updated to describe the function behavior because it's confusing, or the function should be splitted to get_all_namespaces
, get_namespace_info
, and get_namespaces_info_by_list
1. update_project. In my opinion both `upload project` (update=True) an `update_project` have to be in pepdbagent. Why? Because there can be 3 update cases: a. project doesn't exist in database, and user wants to upload it, and if exists we want to update it. (use upload_project) b. project doesn't exist in database, and user wants to update it, only if it exists. (use update_project) c. upload project if it doesn't exist (use upload_project)
This should be: add_project(overwrite=True, update_only=False)
a: add_project(overwrite=True, update_only=False)
b: add_project(update_only=True)
c: add_project(overwrite=False)
Also, see: Postgres: ON CONFLICT DO UPDATE
for your insert statement, to avoid 2 separate sql statements.
In this PR, I deleted methods that are not used in pephub. In my oppinion, some of this methods are usefull and and were written by @nleroy917 request.
-- Could you please check if this fuctions can be deleted, and won't be used?
I haven't deleted
get_project_annotation
method, as this is one way to retrieve annotation from project annotation column.