google / personfinder

Person Finder is a searchable missing person database written in Python and hosted on App Engine.
https://google.org/personfinder
Apache License 2.0
533 stars 194 forks source link

Make imports in Python consistent with the style guide #615

Open gimite opened 5 years ago

gimite commented 5 years ago

Our current code base has many imports in Python not following this style guide: https://github.com/google/styleguide/blob/gh-pages/pyguide.md#22-imports

These cases should be fixed to import packages/modules instead.

vintol commented 5 years ago

Hello and Welcome

Import All

I have crunched some numbers and code ...

After the incident with the error, I have realised this.

There are some files that should be reformatted first, for ease and to save revisions.

For example, List of files that are imported in from somewhere import * fashion.

Total Times Imported Module/ File
22 model
15 utils
7 .timemachine
4 pyasn1_modules.rfc2459
3 const

So, app/model.py would have to go first. Any changes in the function names or anything would also have to be reflected in files that import them. They are 22 of them.

Not to forget files using from model import something.

All in all, there are 33 files.

Import Some

All modules imported via from module import somethings

Times Modules
161 pyasn1.type
118 future
56 google.appengine.ext
53 google.auth
51 oauth2client
51 google.appengine.api
43 .
33 model
27 datetime
25 six.moves
23 unittest
22 utils
22 django.utils.translation
21 pyasn1_modules
21 pyasn1
19 babel.core
18 googleapiclient
15 babel.util
15 babel._compat
14 urllib3.exceptions
14 .enums
14 .._compat
13 simplejson.compat
13 googleapiclient.errors
13 collections
13 .compat
12 server_tests_base
11 requests_toolbelt
11 const
11 .exceptions
10 socket
10 rsa._compat
10 googleapiclient.http
10 babel
gimite commented 5 years ago

I would like to take a shot at this.

That would be great, thanks!

Let me take examples to better understand your proposal.

https://github.com/google/personfinder/blob/master/app/view.py contains:

from model import *
...
person = Person.get(self.repo, self.params.id)

and https://github.com/google/personfinder/blob/master/app/create.py contains:

from photo import create_photo, PhotoError
...
note_photo, note_photo_url = \
    create_photo(self.params.note_photo, self)

How does the code look like by applying your proposed change? What I imagined was:

import model
...
person = model.Person.get(self.repo, self.params.id)
import photo
...
note_photo, note_photo_url = \
    photo.create_photo(self.params.note_photo, self)

Whoa, grepped the import lines. That's some 4000 imports in whole code. Might take some time. Especially where I have to find the modules actually being used where from x import * is used.

Let me know if there is a time limit or deadline or something for this. I am doing this in my spare time.

There is no time limit or deadline, so take your time. And you don't need to do everything at once (and you probably shouldn't). It would probably be best to first send a pull request to only fix a single file (e.g., app/create.py) to check if the change matches our intention, and apply it to other files later.

Just FYI I found a bit tricky example:

photo, photo_url = create_photo(self.params.photo, self)

In this case the variable must be also renamed e.g.:

person_photo, person_photo_url = photo.create_photo(self.params.photo, self)

To avoid collision with the module name. Something worth keeping in mind.

vintol commented 5 years ago

What I meant was this :

from model import *
...
person = Person.get(self.repo, self.params.id)

This could be done in one of 2 ways:

  1. The same thing @gimite did in post above.
import model
...
person = model.Person.get(self.repo, self.params.id))

This is obviously the one to aim for but it might require more attention during this change.

Take a look at this file :

  1. Other way
    
    from model import Person,[Some Other Module],[Some Another Module],[Etc ...]
    ...
    person = Person.get(self.repo, self.params.id))

> The advantage of the 2nd way is that is does not change anything else than the import code.

Tommorrow, 
I intend to show case a file with reformatted code : app/view.py

A quick question though,

in view.py I see
Line 18 `from photo import create_photo, PhotoError`
but that is not used anywhere in the file and is not required.
Am I mistaken ?
gimite commented 5 years ago

Thanks for the clarification!

from model import Person,[Some Other Module],[Some Another Module],[Etc ...]
...
person = Person.get(self.repo, self.params.id))

This still violates our style guide: https://github.com/google/styleguide/blob/gh-pages/pyguide.md#22-imports

Use import statements for packages and modules only, not for individual classes or functions.

So I would prefer to go with the first option even though it is more work.

A quick question though,

in view.py I see Line 18 from photo import create_photo, PhotoError but that is not used anywhere in the file and is not required. Am I mistaken ?

It is true for this particular case, and you would also see a lot of unused imports like this unfortunately :( Feel free to (and you are welcomed to) clean up unused imports as well when you find them.

vintol commented 5 years ago

The file that I promised.

Ran the app with the altered code. No errors there.

I was able to enter a record. Found it when searching. But could not open it.

URL :
http://localhost:8000/haiti/view?family_name=&given_name=&id=haiti.personfinder.google.org%2Fperson.5066549580791808&query_location=&query_name=Kumar&role=seek

ERROR:

500 Internal Server Error
The server has either erred or is incapable of performing the requested operation.

Solved this error, I was able to view the record as well. See ERROR section.

Script

While reformatting the import code

At first I thought, for the imports like from model import *

I would have to search every class and function there is in model.py in the working file. But then I was hoping for another way, something like


Default = dir()
from model import *
New = dir()
for i in New:
    if i not in Default:print(i)

This is premitive, i know. But it's just to give you an idea. Then maybe look for that imported functions in the present file manually via search or maybe write some more code.

This would save much time

Anyhow, After some while I gave that up. And started writing a script that relies on text parsing. Based on the keywords like def, class and indentation. I get a list of func and class, (variables not imeplemented yet), then look for those names in the file that I am currently working on.

It still needs work but it will greatly reduce the effort of going through the entire files.

Few things on which I need some feedback :

HELP1

from django.utils.translation import ugettext as _ What to do here ?

Rename that ?

1 ] import django.utils.translation.gettext as _

2 ] import django.utils.translation.gettext

3] import django.utils.translation

4] One can not go any higher right? import django.utils won't work.

HELP2

Also this one, from google.appengine.api import datastore_errors

1 ] import google.appengine.api

2 ] import google.appengine.api.datastore_errors

Import only one perticular thing

In both ways above the datastore_errors would have to be renamed

3] Leave it as it is . Again Style code: Not for importing functions.


ERROR

Full : 

  File "/home/***/Documents/GitHub/personfinder/app/view.py", line 164, in get
    admin=users.is_current_user_admin(),
NameError: global name 'users' is not defined

This was because, There is an import in utils.py from google.appengine.api import users since in view.py originally imported utils.py like from utils import *, the users was accessible directly. But after the reformat import utils, it became utils.users. Hence, the error.

HELP 3

Read ERROR section first.

So, now the users can be used in two different ways. 1 utils.users This is already imported because of import utils and another import code in utils.py

2 google.appengine.api.users;

In this perticular case, google.appengine.api was already imported.

But maybe in others we could use utils.users to avoid additional import. Although that would make the code a bit harder to understand. As to find the original users finction, a person would have to go to view.py -> utils.py -> google.appengine.api

So, which way do we go ?

nworden commented 5 years ago

But, I get this error, when importing the python file Using the appengine directly downloaded in the PATH ModuleNotFoundError: No module named 'google'

To be clear, you mean when including APPENGINE_DIR on the path, same as when you run the app? Could you let me know what version of the SDK you're using (it should be the first line of output if you run cat $APPENGINE_DIR/VERSION)? And where you're running it from and where your script is located relative to the app directory?

from django.utils.translation import ugettext as _ What to do here ?

IMHO this is the sort of situation where maybe it's not worth following the style guide (_('user-facing') string seems more readable than the alternatives), but I'll yield to Hiroshi. If we do want to stick to the style guide, one option to minimize the length of the calls would be to import the translation module like this: import django.utils.translation as t so that the calls could just be t.ugettext.

gimite commented 5 years ago

IMHO this is the sort of situation where maybe it's not worth following the style guide

I agree that we should consider this as an exception and leave it as is, because I believe it's a common pattern for this particular module. Sorry I forgot to mention that earlier.

vintol commented 5 years ago

I updated my earlier comment. Maybe take a look at that for other things. @nworden :

Devlopment Environment

For convenience, I am showing just the directory name here, In reality, I am using full path wherever applicable.

Python2 + pips

Using python2 env setup using miniconda Installed all the requirements.txt plus others mentioned in the guide to set up dev env.

conda set up is in a directory tes. I have created a symlink of that python2 executable to my /usr/local/bin called py2. Running everything using that.

G SDK

Google SDK v240.0.0 set up in it's own dir outside the tree of tes called gsdk. I changed the script of gsdk/bin/gcloud to always use python=py2

Installed the app engine python and extras.

I remember sdk modified .bashrc but whole time I am using fish shell.

Trying something new here, nut here's what I did before. prepended the PATH with gsdk.

cd to the personfinder.

> py2
import google # works fine but Nothing in dir(google).
import google.appimage #ERROR : No module named `appimage` 

G App Engine

release: "1.9.85" timestamp: 1551772874 Directly downloaded and extrected. to a dir appengine

prepended the PATH with appengine. cd to the personfinder.

> py2
import google #ERROR : No module named `google` 
import google.appimage  #ERROR : No module named `google` 

Run personfinder

prepended the PATH with gsdk. cd to the personfinder.

Have not tried that.

prepended the PATH with appengine. cd to the personfinder.

env APPENGINE_DIR="appengine" tools/gae run app --port=8000

That works fine.

vintol commented 5 years ago

I have an Important question.

How do we integrate the changes being made ?

Do I just commit to the main branch, can I even do that ? OR keep working on the forked branch or something else ... ?

gimite commented 5 years ago

You can just send pull requests. We can review and merge them into the master branch.

vintol commented 5 years ago

Okay. Doing that. I have started working.

I am Removing the first comment I made and using it as a sort of TODO board and a Progress bar for this Issue.

Do I create a pull request when I update a bunch of files or when all of them are done ?

vintol commented 5 years ago

In Model.py

from google.appengine.ext import db 135 refrences to db.

from google.appengine import ext or leave as it is ?

gimite commented 5 years ago

https://cloud.google.com/appengine/docs/standard/python/refdocs/google.appengine.ext.db?hl=en db is a package, so you can leave it as is.

It is sometimes unclear if the symbol is a package/module or a variable. But I believe you can consult documentation or read code of the library to distinguish them.